Methods and apparatus to adjust demographic information of user accounts to reflect primary users of the user accounts

ABSTRACT

Methods, apparatus, systems, and articles of manufacture are disclosed to adjust demographic information of user accounts to reflect primary users of the user accounts. An example apparatus includes memory, programmable circuitry, and instructions in the memory, the instructions to cause the programmable circuitry to at least access impression data associated with a user account registered with a database proprietor, the user account associated with first demographics at a database of the database proprietor. The example programmable circuitry is also to determine a primary user of the user account based on the impression data and based on second demographics of multiple users of the user account, the multiple users including the primary user. Additionally, the example programmable circuitry is to modify the first demographics associated with the user account based on at least some of the second demographics, the at least some of the second demographics corresponding to the primary user.

RELATED APPLICATIONS

This patent arises from a continuation of PCT Patent Application No.PCT/US2021/032062 and U.S. patent application Ser. No. 17/318,766, andclaims priority to PCT Patent Application No. PCT/US2021/032062, filedMay 12, 2021, and U.S. patent application Ser. No. 17/318,766, filed May12, 2021. PCT Patent Application No. PCT/US2021/032062 and U.S. patentapplication Ser. No. 17/318,766 claim the benefit of U.S. ProvisionalPatent Application No. 63/024,260, filed May 13, 2020. PCT PatentApplication No. PCT/US2021/032062; U.S. patent application Ser. No.17/318,766; and U.S. Provisional Patent Application No. 63/024,260 arehereby incorporated herein by reference in their entireties. Priority toPCT Patent Application No. PCT/US2021/032062; U.S. patent applicationSer. No. 17/318,766; and U.S. Provisional Patent Application No.63/024,260 is hereby claimed.

Additionally, U.S. patent application Ser. No. 17/316,168, entitled“METHODS AND APPARATUS TO GENERATE COMPUTER-TRAINED MACHINE LEARNINGMODELS TO CORRECT COMPUTER-GENERATED ERRORS IN AUDIENCE DATA,” which wasfiled on May 10, 2021, U.S. patent application Ser. No. 17/317,404,entitled “METHODS AND APPARATUS TO GENERATE AUDIENCE METRICS USINGTHIRD-PARTY PRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed onMay 11, 2021, U.S. patent application Ser. No. 17/317,461, entitled“METHODS AND APPARATUS FOR MULTI-ACCOUNT ADJUSTMENT IN THIRD-PARTYPRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed on May 11, 2021,U.S. patent application Ser. No. 17/317,616, entitled “METHODS ANDAPPARATUS TO GENERATE AUDIENCE METRICS USING THIRD-PARTYPRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed on May 11, 2021,U.S. patent application Ser. No. 17/318,420, entitled “METHODS ANDAPPARATUS TO GENERATE AUDIENCE METRICS USING THIRD-PARTYPRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed on May 12, 2021,and U.S. patent application Ser. No. 17/318,517, entitled “METHODS ANDAPPARATUS TO GENERATE AUDIENCE METRICS USING THIRD-PARTYPRIVACY-PROTECTED CLOUD ENVIRONMENTS,” which was filed on May 12, 2021are hereby incorporated herein by reference in their entireties.

FIELD OF THE DISCLOSURE

This disclosure relates generally to computer systems for monitoringaudiences, and, more particularly, to methods and apparatus to adjustdemographic information of user accounts to reflect primary users of theuser accounts.

BACKGROUND

Audience measurement entities (AMEs) collect audience measurementinformation from panelists (e.g., individuals who agree to be monitoredby the AMEs) including the number of unique audience members forparticular media and the number of impressions of the mediacorresponding to each of the audience members. In some examples, AMEsutilize third-party cookies (e.g., where the AMEs are third partiesrelative to the entity serving media to a client device) to collectaudience measurement information. In such examples, an AME may issue animpression request to the entity serving the media to client devices.Third-party cookie tracking is used by measurement entities to trackaccess to media by client devices from first-party media servers.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating an example system to enable thegeneration of audience measurement metrics based on the merging of datacollected by a database proprietor and an AME.

FIG. 2 is a block diagram illustrating the example system of FIG. 1 withdifferent aspects of the system of FIG. 1 emphasized for clarityincluding an example impression-to-user analyzer.

FIG. 3 is a flowchart representative of machine readable instructionswhich may be executed to implement the privacy-protected cloudenvironment of FIG. 2 .

FIG. 4 is a flowchart representative of machine readable instructionswhich may be executed to implement the example impression-to-useranalyzer of FIG. 2 to determine primary users from among multiplepotential users of panelist user accounts of the database proprietor102.

FIG. 5 is another flowchart representative of machine readableinstructions which may be executed to implement the exampleimpression-to-user analyzer of FIG. 2 to determine primary users ofpanelist user accounts of the database proprietor.

FIG. 6 is a block diagram illustrating the example system of FIG. 1 withdifferent aspects of the system of FIG. 1 emphasized for clarity.

FIG. 7 is a block diagram of an example processing platform structuredto execute the instructions of FIGS. 3, 4 , and/or 5 to implement theprivacy-protected cloud environment and/or the impression-to-useranalyzer of FIG. 2 .

The figures are not to scale. In general, the same reference numberswill be used throughout the drawing(s) and accompanying writtendescription to refer to the same or like parts. As used herein,connection references (e.g., attached, coupled, connected, and joined)may include intermediate members between the elements referenced by theconnection reference and/or relative movement between those elementsunless otherwise indicated. As such, connection references do notnecessarily infer that two elements are directly connected and/or infixed relation to each other.

Unless specifically stated otherwise, descriptors such as “first,”“second,” “third,” etc. are used herein without imputing or otherwiseindicating any meaning of priority, physical order, arrangement in alist, and/or ordering in any way, but are merely used as labels and/orarbitrary names to distinguish elements for ease of understanding thedisclosed examples. In some examples, the descriptor “first” may be usedto refer to an element in the detailed description, while the sameelement may be referred to in a claim with a different descriptor suchas “second” or “third.” In such instances, it should be understood thatsuch descriptors are used merely for identifying those elementsdistinctly that might, for example, otherwise share a same name. As usedherein, “approximately” and “about” refer to dimensions that may not beexact due to manufacturing tolerances and/or other real worldimperfections. As used herein “substantially real time” refers tooccurrence in a near instantaneous manner recognizing there may be realworld delays for computing time, transmission, etc. Thus, unlessotherwise specified, “substantially real time” refers to real time +/−1second.

DETAILED DESCRIPTION

AMEs usually collect large amounts of audience measurement informationfrom their panelists including the number of unique audience members forparticular media and the number of impressions corresponding to each ofthe audience members. Unique audience size, as used herein, refers tothe total number of unique people (e.g., non-duplicate people) who hadan impression of (e.g., were exposed to) a particular media item,without counting duplicate audience members. As used herein, animpression is defined to be an event in which a home or individualaccesses and/or is exposed to media (e.g., an advertisement, content, agroup of advertisements and/or a collection of content). Impressioncount, as used herein, refers to the number of times audience membersare exposed to a particular media item. The unique audience sizeassociated with a particular media item will always be equal to or lessthan the number of impressions associated with the media item because,while all audience members by definition have at least one impression ofthe media, an individual audience member may have more than oneimpression. That is, the unique audience size is equal to the impressioncount only when every audience member was exposed to the media only asingle time (i.e., the number of audience members equals the number ofimpressions). Where at least one audience member is exposed to the mediamultiple times, the unique audience size will be less than the totalimpression count because multiple impressions will be associated withindividual audience members. Thus, unique audience size refers to thenumber of unique people in an audience (without double counting anyperson) exposed to media for which audience metrics are being generated.Unique audience size may also be referred to as unique audience,deduplicated audience size, deduplicated audience, or audience.

Techniques for monitoring user access to an Internet-accessible media,such as digital television (DTV) media and digital content ratings (DCR)media, have evolved significantly over the years. Internet-accessiblemedia is also known as digital media. In the past, such monitoring wasdone primarily through server logs. In particular, media providersserving media on the Internet would log the number of requests receivedfor their media at their servers. Basing Internet usage research onserver logs is problematic for several reasons. For example, server logscan be tampered with either directly or via zombie programs, whichrepeatedly request media from the server to increase the server logcounts. Also, media is sometimes retrieved once, cached locally and thenrepeatedly accessed from the local cache without involving the server.Server logs cannot track such repeat views of cached media. Thus, serverlogs are susceptible to both over-counting and under-counting errors.

As Internet technology advanced, the limitations of server logs wereovercome through methodologies in which the Internet media to be trackedwas tagged with monitoring instructions. In particular, monitoringinstructions (also known as a media impression request or a beaconrequest) are associated with the hypertext markup language (HTML) of themedia to be tracked. When a client requests the media, both the mediaand the impression request are downloaded to the client. The impressionrequests are, thus, executed whenever the media is accessed, be it froma server or from a cache.

The beacon instructions cause monitoring data reflecting informationabout the access to the media (e.g., the occurrence of a mediaimpression) to be sent from the client that downloaded the media to amonitoring server. Typically, the monitoring server is owned and/oroperated by an AME (e.g., any party interested in measuring or trackingaudience exposures to advertisements, media, and/or any other media)that did not provide the media to the client and who is a trusted thirdparty for providing accurate usage statistics (e.g., The NielsenCompany, LLC). Advantageously, because the beaconing instructions areassociated with the media and executed by the client browser wheneverthe media is accessed, the monitoring information is provided to the AMEirrespective of whether the client is associated with a panelist of theAME. In this manner, the AME is able to track every time a person isexposed to the media on a census-wide or population-wide level. As aresult, the AME can reliably determine the total impression count forthe media without having to extrapolate from panel data collected from arelatively limited pool of panelists within the population. Frequently,such beacon requests are implemented in connection with third-partycookies. Since the AME is a third party relative to the first partyserving the media to the client device, the cookie sent to the AME inthe impression request to report the occurrence of the media impressionof the client device is a third-party cookie. Third-party cookietracking is used by audience measurement servers to track access tomedia by client devices from first-party media servers.

Tracking impressions by tagging media with beacon instructions usingthird-party cookies is insufficient, by itself, to enable an AME toreliably determine the unique audience size associated with the media ifthe AME cannot identify the individual user associated with thethird-party cookie. That is, the unique audience size cannot bedetermined because the collected monitoring information does notuniquely identify the person(s) exposed to the media. Under suchcircumstances, the AME cannot determine whether two reported impressionsare associated with the same person or two separate people. The AME mayset a third-party cookie on a client device reporting the monitoringinformation to identify when multiple impressions occur using the samedevice. However, cookie information does not indicate whether the sameperson used the client device in connection with each media impression.Furthermore, the same person may access media using multiple differentdevices that have different cookies so that the AME cannot directlydetermine when two separate impressions are associated with the sameperson or two different people.

Furthermore, the monitoring information reported by a client deviceexecuting the beacon instructions does not provide an indication of thedemographics or other user information associated with the person(s)exposed to the associated media. To at least partially address thisissue, the AME establishes a panel of users who have agreed to providetheir demographic information and to have their Internet browsingactivities monitored. When an individual joins the panel, that personprovides corresponding detailed information concerning the person'sidentity and demographics (e.g., gender, race, income, home location,occupation, etc.) to the AME. The AME sets a cookie on the panelistcomputer that enables the AME to identify the panelist whenever thepanelist accesses tagged media and, thus, sends monitoring informationto the AME. Additionally or alternatively, the AME may identify thepanelists using other techniques (independent of cookies) by, forexample, prompting the user to login or identify themselves. While AMEsare able to obtain user-level information for impressions from panelists(e.g., identify unique individuals associated with particular mediaimpressions), most of the client devices providing monitoringinformation from the tagged pages are not panelists. Thus, the identityof most people accessing media remains unknown to the AME such that itis necessary for the AME to use statistical methods to imputedemographic information based on the data collected for panelists to thelarger population of users providing data for the tagged media. However,panel sizes of AMEs remain small compared to the general population ofusers.

There are many database proprietors operating on the Internet. Thesedatabase proprietors provide services to large numbers of subscribers.Examples of such database proprietors include social network sites(e.g., Facebook, Twitter, MySpace, etc.), multi-service sites (e.g.,Yahoo!, Google, Axiom, Catalina, etc.), online retailer sites (e.g.,Amazon.com, Buy.com, etc.), credit reporting sites (e.g., Experian),streaming media sites (e.g., YouTube, Hulu, etc.), etc. In exchange forthe provision of services, the subscribers register with the databaseproprietors. As used herein, the term “registered user” refers to anindividual who has established a user account with a database proprietor(e.g., the individual who subscribes to the database proprietor).Database proprietors set cookies and/or other device/user identifiers onthe client devices of their registered users to enable the databaseproprietors to recognize their registered users when their registeredusers visit website(s) on the Internet domains of the databaseproprietors.

The protocols of the Internet make cookies inaccessible outside of thedomain (e.g., Internet domain, domain name, etc.) on which they wereset. Thus, a cookie set in, for example, the YouTube.com domain (e.g., afirst party) is accessible to servers in theYouTube.com domain, but notto servers outside that domain. Therefore, although an AME (e.g., athird party) might find it advantageous to access the cookies set by thedatabase proprietors, they are unable to do so. However, techniques havebeen developed that enable an AME to leverage media impressioninformation collected in association with demographic information inregistered user databases of database proprietors to collect moreextensive Internet usage (e.g., beyond the limited pool of individualsparticipating in an AME panel) by extending the impression requestprocess to encompass partnered database proprietors and by using suchpartners as interim data collectors. In particular, this task isaccomplished by structuring the AME to respond to impression requestsfrom clients (who may not be a member of an audience measurement paneland, thus, may be unknown to the AME) by redirecting the clients fromthe AME to a database proprietor, such as a social network sitepartnered with the AME, using an impression response. Such a redirectioninitiates a communication session between the client accessing thetagged media and the database proprietor. For example, the impressionresponse received from the AME may cause the client to send a secondimpression request to the database proprietor along with a cookie set bythat database proprietor. In response to receiving this impressionrequest, the database proprietor (e.g., Facebook) can access the cookieit has set on the client to thereby identify the client based on theinternal records of the database proprietor.

In the event the client corresponds to a registered user of the databaseproprietor (as determined from the cookie associated with the client),the database proprietor logs/records a database proprietor demographicimpression in association with the client/user. As used herein, ademographic impression is an impression that can be matched toparticular demographic information of a particular registered user ofthe services of a database proprietor. The database proprietor has thedemographic information for the particular registered user because theregistered user would have provided such information when setting up anaccount to subscribe to the services of the database proprietor.

Sharing of demographic information associated with registered users ofdatabase proprietors enables AMEs to extend or supplement their paneldata with substantially reliable demographics information from externalsources (e.g., database proprietors), thus extending the coverage,accuracy, and/or completeness of their demographics-based audiencemeasurements. Such access also enables the AME to monitor persons whowould not otherwise have joined an AME panel. Any web service providerhaving a database identifying demographics of a set of individuals maycooperate with the AME. Such web service providers may be referred to as“database proprietors” and include, for example, wireless servicecarriers, mobile software/service providers, social media sites (e.g.,Facebook, Twitter, MySpace, etc.), online retailer sites (e.g.,Amazon.com, Buy.com, etc.), multi-service sites (e.g., Yahoo!, Google,Experian, etc.), and/or any other Internet sites that collectdemographic data of users and/or otherwise maintain user registrationrecords. The use of demographic information from disparate data sources(e.g., high-quality demographic information from the panels of anaudience measurement entity and/or registered user data of databaseproprietors) results in improved reporting effectiveness of metrics forboth online and offline advertising campaigns.

The above approach to generating audience metrics by an AME depends uponthe beacon requests (or tags) associated with the media to be monitoredto enable an AME to obtain census wide impression counts (e.g.,impressions that include the entire population exposed to the mediaregardless of whether the audience members are panelists of the AME).Further, the above approach also depends on third-party cookies toenable the enrichment of the census impressions with demographicinformation from database proprietors. However, in more recent years,there has been a movement away from the use of third-party cookies bythird parties. Thus, while media providers (e.g., database proprietors)may still use first-party cookies to collect first-party data, theelimination of third-party cookies prevents the tracking of Internetmedia by AMEs (outside of client devices associated with panelists forwhich the AME has provided a meter to track Internet usage behavior).Furthermore, independent of the use of cookies, some databaseproprietors are moving towards the elimination of third party impressionrequests or tags (e.g., redirect instructions) embedded in media (e.g.,beginning in 2020, third-party tags will no longer be allowed onYoutube.com and other Google Video Partner (GVP) sites). As technologymoves in this direction, AMEs (e.g., third parties) will no longer beable to track census wide impressions of media in the manner they havein the past. Furthermore, AMEs will no longer be able to send a redirectrequest to a client accessing media to cause a second impression requestto a database proprietor to associate the impression with demographicinformation. Thus, the only Internet media monitoring that AMEs will beable to directly perform in such a system will be with panelists thathave agreed to be monitored using different techniques that do notdepend on third-party cookies and/or tags.

Examples disclosed herein overcome at least some of the limitations thatarise out of the elimination of third-party cookies and/or third-partytags by enabling the merging of high-quality demographic informationfrom the panels of an AME with media impression data that continues tobe collected by database proprietors. As mentioned above, whilethird-party cookies and/or third-party tags may be eliminated, databaseproprietors that provide and/or manage the delivery of media accessedonline are still able to track impressions of the media (e.g., viafirst-party cookies and/or first-party tags). Furthermore, databaseproprietors are still able to associate demographic information with theimpressions whenever the impressions can be matched to a particularregistered user of the database proprietor for which demographicinformation has been collected (e.g., when the user registered with thedatabase proprietor). In some examples, the merging of AME panel dataand database proprietor impressions data is merged in aprivacy-protected cloud environment maintained by the databaseproprietor.

More particularly, FIG. 1 is a block diagram illustrating an examplesystem 100 to enable the generation of audience measurement metricsbased on the merging of data collected by a database proprietor 102 andan AME 104. More particularly, in some examples, the data includes AMEpanel data (that includes media impressions for panelists that areassociated with high-quality demographic information collected by theAME 104) and database proprietor impressions data (which may be enrichedwith demographic and/or other information available to the databaseproprietor 102). In the illustrated example, these disparate sources ofdata are combined within a privacy-protected cloud environment 106managed and/or maintained by the database proprietor 102. Theprivacy-protected cloud environment 106 is a cloud-based environmentthat enables media providers (e.g., advertisers and/or contentproviders) and third parties (e.g., the AME 104) to input and combinetheir data with data from the database proprietor 102 inside a datawarehouse or data store that enables efficient big data analysis. Thecombining of data from different parties (e.g., different Internetdomains) presents risks to the privacy of the data associated withindividuals represented by the data from the different parties.Accordingly, the privacy-protected cloud environment 106 is establishedwith privacy constraints that prevent any associated party (includingthe database proprietor 102) from accessing private informationassociated with particular individuals. Rather, any data extracted fromthe privacy-protected cloud environment 106 following a big dataanalysis and/or query is limited to aggregated information. A specificexample of the privacy-protected cloud environment 106 is the Ads DataHub (ADH) developed by Google.

As used herein, a media impression is defined as an occurrence of accessand/or exposure to media 108 (e.g., an advertisement, a movie, a movietrailer, a song, a web page banner, etc.). Examples disclosed herein maybe used to monitor for media impressions of any one or more media types(e.g., video, audio, a web page, an image, text, etc.). In examplesdisclosed herein, the media 108 may be primary content and/oradvertisements. Examples disclosed herein are not restricted for usewith any particular type of media. On the contrary, examples disclosedherein may be implemented in connection with tracking impressions formedia of any type or form in a network.

In the illustrated example of FIG. 1 , content providers and/oradvertisers distribute the media 108 via the Internet to users thataccess websites and/or online television services (e.g., web-based TV,Internet protocol TV (IPTV), etc.). For purposes of explanation,examples disclosed herein are described assuming the media 108 is anadvertisement that may be provided in connection with particular contentof primary interest to a user. In some examples, the media 108 is servedby media servers managed by and/or associated with the databaseproprietor 102 that manages and/or maintains the privacy-protected cloudenvironment 106. For example, the database proprietor 102 may be Google,and the media 108 corresponds to ads served with videos accessed viaYoutube.com and/or via other Google video partners (GVPs). Moregenerally, in some examples, the database proprietor 102 includescorresponding database proprietor servers that can serve media 108 toindividuals via client devices 110. In the illustrated example of FIG. 1, the client devices 110 may be stationary or portable computers,handheld computing devices, smart phones, Internet appliances, smarttelevisions, and/or any other type of device that may be connected tothe Internet and capable of presenting media. For purposes ofexplanation, the client devices 110 of FIG. 1 include panelist clientdevices 112 and non-panelist client devices 114 to indicate that atleast some individuals that access and/or are exposed to the media 108correspond to panelists who have provided detailed demographicinformation to the AME 104 and have agreed to enable the AME 104 totrack their exposure to the media 108. In many situations, otherindividuals who are not panelists will also be exposed to the media 108(e.g., via the non-panelist client devices 114). Typically, the numberof non-panelist audience members for a particular media item will besignificantly greater than the number of panelist audience members. Insome examples, the panelist client devices 112 may include and/orimplement an audience measurement meter 115 that captures theimpressions of media 108 accessed by the panelist client devices 112(along with associated information) and reports the same to the AME 104.In some examples, the audience measurement meter 115 may be a separatedevice from the panelist client device 112 used to access the media 108.

In some examples, the media 108 is associated with a unique impressionidentifier (e.g., a consumer playback nonce (CPN)) generated by thedatabase proprietor 102. In some examples, the impression identifierserves to uniquely identify a particular impression of the media 108.Thus, even though the same media 108 may be served multiple times, eachtime the media 108 is served the database proprietor 102 will generate anew and different impression identifier so that each impression of themedia 108 can be distinguished from every other impression of the media.In some examples, the impression identifier is encoded into a uniformresource locator (URL) used to access the primary content (e.g., aparticular YouTube video) along with which the media 108 (as anadvertisement) is served. In some examples, with the impressionidentifier (e.g., CPN) encoded into the URL associated with the media108, the audience measurement meter 115 extracts the identifier at thetime that a media impression occurs so that the AME 104 is able toassociate a captured impression with the impression identifier.

In some examples, the meter 115 may not be able to obtain the impressionidentifier (e.g., CPN) to associate with a particular media impression.For instance, in some examples where the panelist client device 112 is amobile device, the meter 115 collects a mobile advertising identifier(MAID) and/or an identifier for advertisers (IDFA) that may be used touniquely identify client devices 110 (e.g., the panelist client devices112 being monitored by the AME 104). In some examples, the meter 115reports the MAID and/or IDFA for the particular device associated withthe meter 115 to the AME 104. The AME 104, in turn, provides the MAIDand/or IDFA to the database proprietor 102 in a double blind exchangethrough which the database proprietor 102 provides the AME 104 with theimpression identifiers (e.g., CPNs) associated with the client device110 identified by the MAID and/or IDFA. Once the AME 104 receives theimpression identifiers for the client device 110 (e.g., a particularpanelist client device 112), the impression identifiers are associatedwith the impressions previously collected in connection with the device.

In the illustrated example, the database proprietor 102 logs each mediaimpression occurring on any of the client devices 110 within theprivacy-protected cloud environment 106. In some examples, logging animpression includes logging the time the impression occurred and thetype of client device 110 (e.g., whether a desktop device, a mobiledevice, a tablet device, etc.) on which the impression occurred.Further, in some examples, impressions are logged along with theimpression's unique impression identifier. In this example, theimpressions and associated identifiers are logged in a campaignimpressions database 116. The campaign impressions database 116 storesall impressions of the media 108 regardless of whether any particularimpression was detected from a panelist client device 112 or anon-panelist client device 114. Furthermore, the campaign impressionsdatabase 116 stores all impressions of the media 108 regardless ofwhether the database proprietor 102 is able to match any particularimpression to a particular registered user of the database proprietor102. As mentioned above, in some examples, the database proprietor 102identifies a particular registered user (e.g., subscriber) associatedwith a particular media impression based on a cookie stored on theclient device 110. In some examples, the database proprietor 102associates a particular media impression with a registered user that wassigned into the online services of the database proprietor 102 at thetime the media impression occurred. In some examples, in addition tologging such impressions and associated identifiers in the campaignimpressions database 116, the database proprietor 102 separately logssuch impressions in a matchable impressions database 118. As usedherein, a matchable impression is an impression that the databaseproprietor 102 is able to match to at least one of a particularregistered user (e.g., because the impression occurred on a clientdevice 110 on which a registered user was signed into the databaseproprietor 102) or a particular client device 110 (e.g., based on afirst-party cookie of the database proprietor 102 detected on the clientdevice 110). In some examples, if the database proprietor 102 cannotmatch a particular media impression (e.g., because no registered userwas signed in at the time the media impression occurred and there is norecognizable cookie on the associated client device 110) the impressionsis omitted from the matchable impressions database 118 but is stilllogged in the campaign impressions database 116.

As indicated above, the matchable impressions database 118 includesmedia impressions (and associated unique impression identifiers) thatthe database proprietor 102 is able to match to a particular user thathas registered with the database proprietor 102. In some examples, thematchable impressions database 118 also includes user-based covariatesthat correspond to the particular registered user to which eachimpression in the database was matched. As used herein, a user-basedcovariate refers to any item(s) of information collected and/orgenerated by the database proprietor 102 that can be used to identify,characterize, quantify, and/or distinguish particular registered usersand/or their associated behavior. For example, user-based covariates mayinclude the name, age, and/or gender of the registered user (and/or anyother demographic information about the registered user) collected atthe time the registered user registered with the database proprietor102, and/or the relative frequency with which the registered user usesthe different types of client device 110, the number of media items theregistered user has accessed during a most recent period of time (e.g.,the last 30 days), the search terms entered by the registered userduring a most recent period of time (e.g., the last 30 days), featureembeddings (numerical representations) of classifications of videosviewed and/or searches entered by the registered user, etc. As mentionedabove, the matchable impressions database 118 also includes impressionsmatched to particular client devices 110 (based on first-party cookies),even when the impressions cannot be matched to particular registeredusers (based on the registered users being signed in at the time). Insome such examples, the impressions matched to particular client devices110 are treated as distinct users within the matchable impressionsdatabase 118. However, as no particular user can be identified, suchimpressions in the matchable impressions database 118 will not beassociated with any user-based covariates.

Although only one campaign impressions database 116 is shown in theillustrated example, the privacy-protected cloud environment 106 mayinclude any number of campaign impressions databases 116, with eachdatabase storing impressions corresponding to different media campaignsassociated with one or more different advertisers (e.g., productmanufacturers, service providers, retailers, advertisement servers,etc.). In other examples, a single campaign impressions database 116 maystore the impressions associated with multiple different campaigns. Insome such examples, the campaign impressions database 116 may store acampaign identifier in connection with each impression to identify theparticular campaign to which the impression is associated. Similarly, insome examples, the privacy-protected cloud environment 106 may includeone or more matchable impressions databases 118 as appropriate. Further,in some examples, the campaign impressions database 116 and thematchable impressions database 118 may be combined and/or represented ina single database.

In the illustrated example of FIG. 1 , impressions occurring on theclient devices 110 are shown as being reported (e.g., via networkcommunications) directly to both the campaign impressions database 116and the matchable impressions database 118. However, this should not beinterpreted as necessarily requiring multiple separate networkcommunications from the client devices 110 to the database proprietor102. Rather, in some examples, notifications of impressions arecollected from a single network communication from the client device110, and the database proprietor 102 then populates both the campaignimpressions database 116 and the matchable impressions database 118. Insome examples, the matchable impressions database 118 is generated basedon an analysis of the data in the campaign impressions database 116.Regardless of the particular process by which the two databases 116, 118are populated with logged impressions, in some examples, the user-basedcovariates included in the matchable impressions database 118 may becombined with the logged impressions in the campaign impressionsdatabase 116 and stored in an enriched impressions database 120. Thus,the enriched impressions database includes all (e.g., census wide)logged impressions of the media 108 for the relevant advertisingcampaign and also includes all available user-based covariatesassociated with each of the logged impressions that the databaseproprietor 102 was able to match to a particular registered user.

As shown in the illustrated example, whereas the database proprietor 102is able to collect impressions from both panelist client devices 112 andnon-panelist client devices 114, the AME 104 is limited to collectingimpressions from panelist client devices 112. In some examples, the AME104 also collects the impression identifier associated with eachcollected media impression so that the collected impressions may bematched with the impressions collected by the database proprietor 102 asdescribed further below. In the illustrated example, the impressions(and associated impression identifiers) of the panelists are stored inan AME panel data database 122 that is within an AME first party datastore 124 in an AME proprietary cloud environment 126. In some examples,the AME proprietary cloud environment 126 is a cloud-based storagesystem (e.g., a Google Cloud Project) provided by the databaseproprietor 102 that includes functionality to enable interfacing withthe privacy-protected cloud environment 106 also maintained by thedatabase proprietor 102. As mentioned above, the privacy-protected cloudenvironment 106 is governed by privacy constraints that prevent anyparty (with some limited exceptions for the database proprietor 102)from accessing private information associated with particularindividuals. By contrast, the AME proprietary cloud environment 126 isindicated as proprietary because it is exclusively controlled by the AMEsuch that the AME has full control and access to the data withoutlimitation. While some examples involve the AME proprietary cloudenvironment 126 being a cloud-based system that is provided by thedatabase proprietor 102, in other examples, the AME proprietary cloudenvironment 126 may be provided by a third party distinct from thedatabase proprietor 102.

While the AME 104 is limited to collected impressions (and associatedidentifiers) from only panelists (e.g., via the panelist client devices112), the AME 104 is able to collect panel data that is much more robustthan merely media impressions. As mentioned above, the panelist clientdevices 112 are associated with users that have agreed to participate ona panel of the AME 104. Participation in a panel includes the provisionof detailed demographic information about the panelist and/or allmembers in the panelist's household. Such demographic information mayinclude age, gender, race, ethnicity, education, employment status,income level, geographic location of residence, etc. In addition to suchdemographic information, which may be collected at the time a userenrolls as a panelist, the panelist may also agree to enable the AME 104to track and/or monitor various aspects of the user's behavior. Forexample, the AME 104 may monitor panelists' Internet usage behaviorincluding the frequency of Internet usage, the times of day of suchusage, the websites visited, and the media exposed to (from which themedia impressions are collected).

AME panel data (including media impressions and associated identifiers,demographic information, and Internet usage data) is shown in FIG. 1 asbeing provided directly to the AME panel data database 122 from thepanelist client devices 112. However, in some examples, there may be oneor more intervening operations and/or components that collect and/orprocess the collected data before it is stored in the AME panel datadatabase 122. For instance, in some examples, impressions are initiallycollected and reported to a separate server and/or database that isdistinct from the AME proprietary cloud environment 126. In some suchexamples, this separate server and/or database may not be a cloud-basedsystem. Further, in some examples, such a non-cloud-based system mayinterface directly with the privacy-protected cloud environment 106 suchthat the AME proprietary cloud environment 126 may be omitted entirely.

In some examples, there may be multiple different techniques and/ormethodologies used to collect the AME panel data that depends on theparticular circumstances involved. For example, different monitoringtechniques and/or different types of audience measurement meters 115 maybe employed for media accessed via a desktop computer relative to themedia accessed via a mobile computing device. In some examples, theaudience measurement meter 115 may be implemented as a softwareapplication that panelists agree to install on their devices to monitorall Internet usage activity on the respective devices. In some examples,the meter 115 may prompt a user of a particular device to identifythemselves so that the AME 104 can confirm the identity of the user(e.g., whether it was the mother or daughter in a panelist household).In some examples, prompting a user to self-identify may be consideredoverly intrusive. Accordingly, in some such examples, the circumstancessurrounding the behavior of the user of a panelist client device 112(e.g., time of day, type of content being accessed, etc.) may beanalyzed to infer the identity of the user to some confidence level(e.g., the accessing of children's content in the early afternoon wouldindicate a relatively high probability that a child is using the deviceat that point in time). In some examples, the audience measurement meter115 may be a separate hardware device that is in communication with aparticular panelist client device 112 and enabled to monitor theInternet usage of the panelist client device 112.

In some examples, the processes and/or techniques used by the AME 104 tocapture panel data (including media impressions and who in particularwas exposed to the media) can differ depending on the nature of thepanelist client device 112 through which the media was accessed. Forinstance, in some examples, the identity of the individual using thepanelist client device 112 may be based on the individual responding toa prompt to self-identify. In some examples, such prompts are limited todesktop client devices because such a prompt is viewed as overlyintrusive on a mobile device. However, without specifically prompting auser of a mobile device to self-identify, there often is no direct wayto determine whether the user is the primary user of the device (e.g.,the owner of the device) or someone else (e.g., a child of the primaryuser). Thus, there is the possibility of misattribution of mediaimpressions within the panel data collected using mobile devices. Insome examples, to overcome the issue of misattribution in the paneldata, the AME 104 may develop a machine learning model that can predictthe true user of a mobile device (or any device for that matter) basedon information that the AME 104 does know for certain and/or has accessto. For example, inputs to the machine learning model may include thecomposition of the panelist household, the type (e.g., genre and/orcategory) of the content, the daypart or time of day when the contentwas accessed, etc. In some examples, the truth data used to generate andvalidate such a model may be collected through field surveys in whichthe above input features are tracked and/or monitored for a subset ofpanelists that have agreed to be monitored in this manner (which is moreintrusive than the typical passive monitoring of content accessed viamobile devices).

As mentioned above, in some examples, the AME panel data (stored in theAME panel data database 122) is merged with the database proprietorimpressions data (stored in the matchable impressions database 118)within the privacy-protected cloud environment 106 to take advantage ofthe combination of the disparate sets of data to generate more robustand/or reliable audience measurement metrics. In particular, thedatabase proprietor impressions data provides the advantage of volume.That is, the database proprietor impressions data corresponds to a muchlarger number of impressions than the AME panel data because thedatabase proprietor impressions data includes census wide impressioninformation that includes all impressions collected from both thepanelist client devices 112 (associated with a relatively small pool ofaudience members) and the non-panelist client devices 114. The AME paneldata provides the advantage of high-quality demographic data for astatistically significant pool of audience members (e.g., panelists)that may be used to correct for errors and/or biases in the databaseproprietor impressions data.

One source of error in the database proprietor impressions data is thatthe demographic information for matchable users collected by thedatabase proprietor 102 during user registration may not be truthful. Inparticular, in some examples, many database proprietors impose agerestrictions on their user accounts (e.g., a user must be at least 13years of age, at least 18 years of age to register with the databaseproprietor 102, etc.). However, when a person registers with thedatabase proprietor 102, the person typically self-declares their ageand may, therefore, lie about their age (e.g., an 11 year old may saythey are 18 to bypass the age restrictions for a user account).Independent of age restrictions, a particular user may choose to enteran incorrect age for any other reason or no reason at all whenregistering with the database proprietor 102 (e.g., a 44 year old maychoose to assert they are only 25). Where a database proprietor 102 doesnot verify the self-declared age of registered users, there is arelatively high likelihood that the ages of at least some registeredusers of the database proprietor stored in the matchable impressionsdatabase 118 (as a particular user-based covariate) are inaccurate.Further, it is possible that other self-declared demographic information(e.g., gender, race, ethnicity, income level, etc.) may also befalsified by users during registration.

As described further below, the AME panel data (which contains reliabledemographic information about the panelists) can be used to correct forinaccurate demographic information in the database proprietorimpressions data. Additionally, while the self-declared age of aparticular registered user may be truthful and accurate, a differentperson of a different age may end up using a client device 110 on whichthe particular registered user is logged into the user account. Forexample, a child my access media on a client device 110 in which aparent of the child is logged into a user account of the databaseproprietor 102. As a result, media accessed by the child would bemisattributed to the demographics (e.g., the self-declared age) of theparent.

Thus, even when self-declared demographic information is true, it maynevertheless be wrong with respect to the demographic characteristics ofthe person actually using the user account at any given point in time.This scenario is more common for client devices and/or user accountsthat are used and/or shared by multiple different people (e.g.,different members in a single household). As used herein, the term“shared user account” refers to a user account that is used by more thanone panelist. As such, shared user accounts include user accounts thatare intended to be shared by multiple individuals as well as useraccounts that, as a matter of happenstance, are used by multipleindividuals (e.g., a child inadvertently remains logged into theirparent's user account after the parent used the same computer).

Another source of error in the database proprietor impressions data isbased on the concept of misattribution, which arises in situations wheremultiple different people use the same client device 110 to accessmedia. In some examples, the database proprietor 102 associates aparticular impression to a particular registered user based on theregistered user being signed into a platform provided by the databaseproprietor 102. For example, if a particular person signs into theirGoogle account and begins watching a YouTube video on a particularclient device 110, that person will be attributed with an impression foran ad served during the video because the person was signed in at thetime. However, there may be instances where the person finishes usingthe client device 110 but does not sign out of his or her Googleaccount. Thereafter, a second different person (e.g., a different memberin the family of the first person) begins using the client device 110 toview another YouTube video. Although the second person is now accessingmedia via the client device 110, ad impressions during this time willstill be attributed to the first person because the first person is theone who is still indicated as being signed in (e.g., the user account ofthe first person has become a shared user account). Thus, there islikely to be circumstances where the actual person exposed to media 108is misattributed to a different registered user of the databaseproprietor 102 and/or an unregistered user. The AME panel data (whichincludes an indication of the actual person using the panelist clientdevices 112 at any given moment) can be used to correct formisattribution in the demographic information in the database proprietorimpressions data. As mentioned above, in some situations, the AME paneldata may itself include misattribution errors. Accordingly, in someexamples, the AME panel data may first be corrected for misattributionbefore the AME panel data is used to correct misattribution in thedatabase proprietor impressions data. An example methodology to correctfor misattribution in the database proprietor impressions data isdescribed in Singh et al., U.S. Pat. No. 10,469,903, which is herebyincorporated herein by reference in its entirety.

Misattribution can also occur where there are multiple shared useraccounts on the same device. For example, a parent may log into his orher user account and forget to log out after using a communal desktopcomputer. Subsequently, a child may use the computer for a time beforerealizing that the parent's user account is logged in. Upon realizationthat the parent's user account is logged in, the child may log out ofthe parent's user account, and log into the child's user account. Afterthe child completes use of the communal desktop computer, he or she mayforget to log out. Subsequently, the child's brother or sister may usethe communal family computer without realizing that the child's useraccount is logged in. Accordingly, when multiple shared user accountsare present in a panelist household, self-declared demographicinformation may be wrong with respect to the demographic characteristicsof the person actually using the user account at any given point in timefor multiple accounts. Here, as described above, the AME panel data(which includes an indication of the actual person using the panelistclient devices 112 at any given moment) can be used to correct formisattribution in the demographic information in the database proprietorimpressions data.

Another problem with the database proprietor impressions data is that ofnon-coverage. Non-coverage refers to impressions recorded by thedatabase proprietor 102 that cannot be matched to a particularregistered user of the database proprietor 102. The inability of thedatabase proprietor 102 to match a particular impression to a particularuser can occur for several reasons including that the registered user isnot signed in at the time of the media impression, that the user has notestablished an account with the database proprietor 102, that theregistered user has enabled Limited Ad Tracking (LAT) to prevent theuser account from being associated with ad impressions, or that thecontent associated with the media being monitored corresponds tochildren's content (for which user-based tracking is not performed).While the inability of the database proprietor 102 to match and assign aparticular impression to a particular registered user is not necessarilyan error in the database proprietor impressions data, it does underminethe ability to reliably estimate the total unique audience size for(e.g., the number of unique individuals that were exposed to) aparticular media item. For example, assume that the database proprietor102 records a total of 11,000 impressions for media 108 in a particularadvertising campaign. Further assume that of those 11,000 impressions,the database proprietor 102 is able to match 10,000 impressions to atotal of 5,000 different users (e.g., each user was exposed to the mediaon average 2 times) but is unable to match the remaining 1,000impressions to particular users. Relying solely on the databaseproprietor impressions data, in this example, there is no way todetermine whether the remaining 1,000 impressions should also beattributed to the 5,000 users already exposed at least once to the media108 (for a total audience size of 5,000 people) or if one or more of theremaining 1,000 impressions should be attributed to other users notamong the 5,000 already identified (for a total audience size of up to6,000 people (if every one of the 1,000 impressions was associated witha different person not included in the matched 5,000 users)). In someexamples disclosed herein, the AME panel data can be used to estimatethe distribution of impressions across different users associated withthe non-coverage portion of impressions in the database proprietorimpressions data to thereby estimate a total audience size for therelevant media 108.

Another confounding factor to the estimation of the total uniqueaudience size for media based on the database proprietor impressionsdata is the existence of multiple user accounts of a single registereduser. More particular, in some situations a particular individual mayestablish multiple accounts with the database proprietor 102 fordifferent purposes (e.g., a personal account, a work account, a shareduser account, etc.). Such a situation can result in a larger number ofdifferent users being identified as audience members to media 108 thanthe actual number of individuals exposed to the media 108. For example,assume that a particular person registers three user accounts with thedatabase proprietor 102 and is exposed to the media 108 once whilesigned into each of the three different accounts for a total of threeimpressions. In this scenario, the database proprietor 102 would matcheach impression to a different registered user based on the differentuser accounts making it appear that three different people were exposedto the media 108 when, in fact, only one person was exposed to the mediathree different times. Examples disclosed herein use the AME panel datain conjunction with the database proprietor impressions data to estimatean actual unique audience size from the potentially inflated number ofapparently unique users exposed to the media 108.

In the illustrated example of FIG. 1 , the AME panel data is merged withthe database proprietor impressions data by an example data matchinganalyzer 128. In some examples, the data matching analyzer 128implements an application programming interface (API) that takes thedisparate datasets and matches registered users in the databaseproprietor impressions data with panelists in the AME panel data. Insome examples, registered users are matched with panelists based on theunique impression identifiers (e.g., CPNs) collected in connection withthe media impressions logged by both the database proprietor 102 and theAME 104. The combined data is stored in an AME intermediary merged datadatabase 130 within an AME privacy-protected data store 132. The data inthe AME intermediary merged data database 130 is referred to as“intermediary” because it is at an intermediate stage in the processingbecause it includes AME panel data that has been enhanced and/orcombined with the database proprietor impressions data, but has not yetbeen corrected or adjusted to account for the sources of error and/orbias in the database proprietor impressions data as outlined above.

In some examples, the AME intermediary merged data is analyzed by anadjustment factor analyzer 134 to calculate adjustment or calibrationfactors that may be stored in an adjustment factors database 136 withinan AME output data store 138 of the AME proprietary cloud environment126. In some examples, the adjustment factor analyzer 134 calculatesdifferent types of adjustment factors to account for different types oferrors and/or biases in the database proprietor impressions data. Forinstance, a multi-account adjustment factor corrects for the situationof a single registered user accessing media using multiple differentuser accounts associated with the database proprietor 102. A signed-outadjustment factor corrects for non-coverage associated with registeredusers that access media while signed out of their account associatedwith the database proprietor 102 (so that the database proprietor 102 isunable to associate the impression with the registered users). In someexamples, the adjustment factor analyzer 134 is able to directlycalculate the multi-account adjustment factor and the signed-outadjustment factor in a deterministic manner.

While the multi-account adjustment factors and the signed-out adjustmentfactors may be deterministically calculated, correcting for falsified orotherwise incorrect demographic information (e.g., incorrectlyself-declared ages) of registered users of the database proprietor 102cannot be solved in such a direct and deterministic manner. Rather, insome examples, a machine learning model is developed to analyze andpredict the correct ages of registered users of the database proprietor102. Specifically, as shown in FIG. 1 , the privacy-protected cloudenvironment 106 implements a model generator 140 to generate ademographic correction model using the AME intermediary merged data(stored in the AME intermediary merged data database 130) as inputs.

More particularly, in some examples, self-declared demographics (e.g.,the self-declared age) of registered users of the database proprietor102, along with other covariates associated with the registered users,are used as the input variables or features used to train a model topredict the correct demographics (e.g., correct age) of the registeredusers as validated by the AME panel data, which serves as the truth dataor training labels for the demographic correction model generation.However, in some examples, the self-declared age or other demographicsof a registered user signed into a user account on a panelist clientdevice 112 may not match the age or other demographics of the primaryuser of the user account.

As used herein, the term “primary user of a user account” refers to anindividual whose demographic information an AME should attribute to auser account based on a primary user identification algorithm. While insome instances identifying the primary user of a user account may bestraightforward (e.g., when a user account is used by only one person),in other cases the identity of the primary user of a user account is notas forthcoming (e.g., when multiple people use the same user account).For example, a parent that buys a computer for a child may log into thecomputer with the parent's user account with the database proprietor 102despite the child being the primary user of the parent's user account.Thus, because the registered user of a user account is not always theprimary user of the user account, in some examples (e.g., in the case ofa shared user account), merely relying on the demographics of theregistered user may be insufficient to accurately monitor thedemographics of the person exposed to media. By identifying the primaryuser of the user account, examples disclosed herein determine thedemographics of the person that accessed media. Therefore, examplesdisclosed herein generate reliable demographics and/or other covariatesassociated with the registered users to train models to predict correctdemographics.

In some examples, different demographic correction model(s) may bedeveloped to correct for different types of demographic information thatneeds correcting. For instance, in some examples, a first model can beused to correct the self-declared age of registered users of thedatabase proprietor 102 and a second model can be used to correct theself-declared gender of the registered users. Once the model(s) havebeen trained and validated based on the AME panel data, the model(s) arestored in a demographic correction models database 142.

As mentioned above, there are many different types of covariatescollected and/or generated by the database proprietor 102. In someexamples, the covariates provided by the database proprietor 102 mayinclude a certain number (e.g., 100) of the top search result clickentities and/or video watch entities for every user during a most recentperiod of time (e.g., for the last month). These entities are integeridentifiers (IDs) that map to a knowledge graph of all entities for thesearch result clicks and/or videos watched. That is, as used in thiscontext, an entity corresponds to a particular node in a knowledge graphmaintained by the database proprietor 102. In some examples, the totalnumber of unique IDs in the knowledge graph may number in the tens ofmillions. More particularly, for example, YouTube videos are classifiedacross roughly 20 million unique video entity IDs and Google searchresults are classified across roughly 25 million unique search resultentity IDs. In addition to the top search result click entities and/orvideo watch entities, the database proprietor 102 may also provideembeddings for these entities. An embedding is a numericalrepresentation (e.g., a vector array of values) of some class of similarobjects, images, words, and the like. For example, a particular userthat frequently searches for and/or views cat videos may be associatedwith a feature embedding representative of the class corresponding tocats. Thus, feature embeddings translate relatively high dimensionalvectors of information (e.g., text strings, images, videos, etc.) into alower dimensional space to enable the classification of different butsimilar objects.

In some examples, multiple embeddings may be associated with each searchresult click entity and/or video watch entity. Accordingly, assuming thetop 100 search result entities and video watch entities are providedamong the covariates and that 16 dimension embeddings are provided foreach such entity, this results in a 100×16 matrix of values for everyuser, which may be too much data to process during generation of thedemographic correction models as described above. Accordingly, in someexamples, the dimensionality of the matrix is reduced to a moremanageable size to be used as an input feature for the demographiccorrection model generation.

In some examples, a process is implemented to track differentdemographic correction model experiments over time to achieve highquality (e.g., accurate) models and also for auditing purposes.Accomplishing this objective within the context of the privacy-protectedcloud environment 106 presents several unique challenges because themodel features (e.g., inputs and hyperparameters) and model performance(e.g., accuracy) are stored separately to satisfy the privacyconstraints of the environment.

In some examples, a model analyzer 144 may implement and/or use one ormore demographic correction models to generate predictions and/orinferences as to the actual demographics (e.g., actual ages) ofregistered users associated with media impressions logged by thedatabase proprietor 102. That is, in some examples, as shown in FIG. 1 ,the model analyzer 144 uses one or more of the demographic correctionmodels in the demographic correction models database 142 to analyze theimpressions in the enriched impressions database 120 that were matchedto a particular registered user of the database proprietor 102. Theinferred demographic (e.g., age) for each registered user may be storedin a model inferences database 146 for subsequent use, retrieval, and/oranalysis. Additionally or alternatively, in some examples, the modelanalyzer 144 uses one or more of the demographic correction models inthe demographic correction models database 142 to analyze the entireregistered user base of the database proprietor regardless of whetherthe registered users are matched to any particular media impressions.After inferring the correct demographic (e.g., age) for each registereduser, the inferences are stored in the model inferences database 146. Insome such examples, when the registered users matched to particularimpressions are to be analyzed (e.g., the registered users matched toimpressions in the enriched impressions database 120), the modelanalyzer 144 merely extracts the inferred demographic assignment to eachrelevant registered user in the enriched impressions database 120 thatmatches with one or more media impressions.

As described above, in some examples, the database proprietor 102 mayidentify a particular registered user as corresponding to a particularimpression based on the registered user being signed into the databaseproprietor 102. However, there are circumstances where the individualcorresponding to the user account is not the actual person that wasexposed to the relevant media. Accordingly, merely inferring a correctdemographic (e.g., age) of the registered user associated with thesigned in user account may not be the correct demographic of the actualperson to which a particular media impression should be attributed. Inother words, whereas the AME panelist data and the database proprietorimpressions data is matched at the impression level, demographiccorrection is implemented at the user level. Therefore, beforegenerating the demographic correction model, a method to reduce loggedimpressions to individual users is first implemented so that thedemographic correction model can be reliably implemented. In particular,as shown in the illustrated example of FIG. 2 , instead of using the AMEintermediary merged data directly as an input to the model generator140, in some examples, an impression-to-user analyzer 202 is implementedto generate user-level covariate data from the AME intermediary mergeddata by executing a primary user identification algorithm.

FIG. 2 illustrates the example system 100 of FIG. 1 with certainportions omitted for the sake of clarity and additional portions shownas described further below. In some examples, the user-level covariatedata is stored within a user-level covariate data database 204 withinthe AME privacy-protected data store 132 and provided as an input to themodel generator 140. In some examples, the impression-to-user analyzer202 and the model generator 140 may be incorporated together andimplemented in combination. Example processes to implement the exampleimpression-to-user analyzer 202 of FIG. 2 are represented by theflowcharts in FIGS. 3, 4, and 5 .

In the illustrated example of FIG. 2 , the model generator 140 isimplemented by processor circuitry (e.g., one or more processors, one ormore accelerators, etc.) executing instructions. In the example of FIG.2 , the model generator 140 executes a training algorithm (e.g.,stochastic gradient descent) to train one or more demographic correctionmodels to operate in accordance with patterns and/or associations basedon, for example, training data (e.g., the user-level covariate data fromthe AME intermediary merged data). In general, the one or moredemographic correction models include internal parameters that guide howinput data is transformed into output data, such as through a series ofnodes and connections within the model to transform input data intooutput data. Additionally, hyperparameters are used as part of thetraining process to control how the learning is performed (e.g., alearning rate, a number of layers to be used in the machine learningmodel, etc.). Hyperparameters are defined to be training parameters thatare determined prior to initiating the training process.

In some examples, the example model generator 140 implements examplemeans for generating models. The means for generating models isimplemented by executable instructions such as that implemented by atleast block 304 of FIG. 3 . The executable instructions of block 304 ofFIG. 3 may be executed on at least one processor such as the exampleprocessor 712 of FIG. 7 . In other examples, the means for generatingmodels is implemented by hardware logic, hardware implemented statemachines, logic circuitry, and/or any other combination of hardware,software, and/or firmware.

In the illustrated example of FIG. 2 , the impression-to-user analyzer202 is implemented by processor circuitry (e.g., one or more processors,one or more accelerators, etc.) executing instructions. In the exampleof FIG. 2 , the impression-to-user analyzer 202 includes an examplenetwork interface 206, an example user account controller 208, anexample impression controller 210, an example demographic controller212, and an example score management controller 214. In the example ofFIG. 2 , any of the network interface 206, the user account controller208, the impression controller 210, the demographic controller 212,and/or the score management controller 214 can communicate via anexample communication bus 216.

In examples disclosed herein, the communication bus 216 may beimplemented using any suitable wired and/or wireless communication. Inadditional or alternative examples, the communication bus 216 includessoftware, machine readable instructions, and/or communication protocolsby which information is communicated among the network interface 206,the user account controller 208, the impression controller 210, thedemographic controller 212, and/or the score management controller 214.

In the illustrated example of FIG. 2 , the impression-to-user analyzer202 determines the primary user from among multiple potential users ofone or more user accounts of the database proprietor 102. As describedabove, it is not uncommon for multiple different people (e.g., differentmembers in a single household) to use a user account. For instance,different members in a household may use a common computing device inwhich a single person (e.g., a single registered user) remains logged into a user account of the database proprietor 102 while multipledifferent people use the computer. For example, a first member of thehousehold may log in to the user account of database proprietor 102 forthe first member on the computer and then stop using the computerwithout logging out. Thereafter, a second member of the household maybegin using the computer while the first member is still logged in tothe user account of the database proprietor 102.

In some examples, the second member of the household may use thecomputer far more often than the first member of the household. In suchexamples, the demographics of the second member of the household may notmatch the demographics of the registered user (e.g., the first member ofthe household). In some such examples, the impression-to-user analyzer202 may determine that the second member of the household is the primaryuser of the account (e.g., based on the primary user identificationalgorithm disclosed herein) even though the first member created theaccount in his or her own name and included demographics information(e.g., a self-declared age) for himself or herself. With the primaryuser identified, the impression-to-user analyzer 202 adjusts demographicinformation of user accounts to reflect the primary users of the useraccounts.

In some examples, the example impression-to-user analyzer 202 implementsexample means for adjusting demographics. The means for adjustingdemographics is implemented by executable instructions such as thatimplemented by at least block 302 of FIG.

3; at least blocks 402, 404, and 406 of FIG. 4 ; and/or at least blocks502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528,530, and 532 of FIG. 5 . The executable instructions of block 302 ofFIG. 3 ; blocks 402, 404, and 406 of FIG. 4 ; and/or blocks 502, 504,506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, and 532of FIG. 5 may be executed on at least one processor such as the exampleprocessor 712 of FIG. 7 . In other examples, the means for adjustingdemographics is implemented by hardware logic, hardware implementedstate machines, logic circuitry, and/or any other combination ofhardware, software, and/or firmware.

In the illustrated example of FIG. 2 , the network interface 206 isimplemented by processor circuitry executing instructions. For example,the network interface 206 may be implemented by a network interfacecontroller. In additional or alternative examples, the network interface206 can be implemented by one or more analog or digital circuit(s),logic circuits, programmable processor(s), programmable controller(s),graphics processing unit(s) (GPU(s)), digital signal processor(s)(DSP(s)), application specific integrated circuit(s) (ASIC(s)),programmable logic device(s) (PLD(s)) and/or field programmable logicdevice(s) (FPLD(s)).

In the illustrated example of FIG. 2 , the network interface 206 obtainsthe AME intermediary merged data (stored in the AME intermediary mergeddata database 130). After obtaining the AME intermediary merged data,the network interface 206 forwards the AME intermediary merged data tothe user account controller 208, the impression controller 210, thedemographic controller 212, and/or the score management controller 214.Additionally, after the impression-to-user analyzer 202 determines theprimary user of a user account of the database proprietor 102 from amongother potential users of the user account, the network interface 206associates the primary user (and their demographics) with the useraccount. For examples, the network interface 206 stores the demographicsof the primary user for the user account (e.g., in the user-levelcovariate data database 204).

In some examples, the example network interface 206 implements examplemeans for interfacing. The means for interfacing is implemented byexecutable instructions such as that implemented by at least block 406ofFIG. 4 and/or at least blocks 502 and 528 of FIG. 5 . The executableinstructions of block 406 of FIG. 4 and/or blocks 502 and 528 of FIG. 5may be executed on at least one processor such as the example processor712 of FIG. 7 . In other examples, the means for interfacing isimplemented by hardware logic, hardware implemented state machines,logic circuitry, and/or any other combination of hardware, software,and/or firmware.

In the illustrated example of FIG. 2 , the user account controller 208is implemented by processor circuitry executing instructions. Inadditional or alternative examples, the user account controller 208 canbe implemented by one or more analog or digital circuit(s), logiccircuits, programmable processor(s), programmable controller(s), GPU(s),DSP(s), ASIC(s), PLD(s) and/or FPLD(s). In the example of FIG. 2 , theuser account controller 208 selects user accounts to be analyzed by theimpression-to-user analyzer 202. For example, the user accountcontroller 208 identifies a user account represented in the AME paneldata included within the AME intermediary merged data. Example AME paneldata is collected by device type. For example, a first panel may monitorpanelist use of a desktop or laptop computer while a second panel maymonitor panelist use of a tablet or smart phone. However, examplesdisclosed herein are not limited to a panel collected for a singledevice. On the contrary, examples disclosed herein can determine theprimary user of a user account that is logged in on multiple devices ofdifferent types (e.g., a cell phone and a laptop) and/or a user accountthat is logged in on multiple devices of the same type (e.g., multipledesktop computers).

In the illustrated example of FIG. 2 , the identified user account maybe associated with a panelist household including more than onepanelist. In this manner, the identified user account is associated withone or more panelists. As such, the identified user may be a shared useraccount for which the registered user is not necessarily the primaryuser. After selecting a user account, the user account controller 208forwards an identifier of the selected user account to the impressioncontroller 210, the demographic controller 212, and/or the scoremanagement controller 214.

As described above, in some examples, the data matching analyzer 128matches user accounts of the database proprietor 102 with AME panelistsbased on the unique impression identifiers (e.g., CPNs) collected inconnection with the media impressions logged by both the databaseproprietor 102 and the AME 104. For example, as permitted by the lawsand/or regulations of the geographic location of registered users and/orpanelists, the data matching analyzer 128 may utilize the AME panel datasuch as a panelist's name, a panelist's location, among others, tocompare against similar fields provided by the database proprietor 102.If the data matching analyzer 128 determines that the AME panel datamatches similar data provided by the database proprietor 102, the datamatching analyzer 128 obtains the impression identifier (e.g., a CPN)associated with each impression experienced by an individual logged into the user account (e.g., the registered user). In this manner, thedata matching analyzer 128 maps the data associated with the impressionidentifier to the AME panel data.

In some examples, the example user account controller 208 implementsexample means for managing user accounts. The means for managing useraccounts is implemented by executable instructions such as thatimplemented by at least blocks 504, 530, and 532 of FIG. 5 .

The executable instructions of blocks 504, 530, and 532 of FIG. 5 may beexecuted on at least one processor such as the example processor 712 ofFIG. 7 . In other examples, the means for managing user accounts isimplemented by hardware logic, hardware implemented state machines,logic circuitry, and/or any other combination of hardware, software,and/or firmware.

In the illustrated example of FIG. 2 , the impression controller 210 isimplemented by processor circuitry executing instructions. In additionalor alternative examples, the impression controller 210 can beimplemented by one or more analog or digital circuit(s), logic circuits,programmable processor(s), programmable controller(s), GPU(s), DSP(s),ASIC(s), PLD(s) and/or FPLD(s). In the example of FIG. 2 , theimpression controller 210 determines respective percentages ofimpressions for the selected user account that have been attributed toone or more panelists associated with the selected user account.

For example, the percentages indicate the proportion of impressionslogged for the selected user account that were actually experienced byeach panelist associated with the selected user account. For example,for a shared user account, 30% of the impressions associated with theshared user account may be experienced by a first parent in a household(e.g., a father), 10% of the impressions associated with the shared useraccount may be experienced by a second parent in the household (e.g., amother), and 60% of the impressions associated with the shared useraccount may be experienced by a child in the household.

In the illustrated example of FIG. 2 , the impression controller 210determines which panelist experienced an impression based on differentcriteria associated with the type of client device on which the panelistexperienced the impression. For example, if the client device is adesktop client device, the impression controller 210 determines whichpanelist experienced the impression based on the response to a prompt toself-identify. Alternatively, as mentioned above, for mobile clientdevices, such a prompt is typically viewed as overly intrusive.Accordingly, for many mobile client devices, the impression controller210 attributes 100% of the impressions logged for the selected useraccount to the panelist registered to the mobile client device, based onthe AME panel data.

In the illustrated example of FIG. 2 , the impression controller 210determines respective impression scores for the panelists associatedwith the selected user account. For example, the impression controller210 determines the impression score for each panelist based on thepercentage of impression attributed to each panelist. For example, ifbetween 60% and 80% of the impressions associated with the selected useraccount (e.g., selected shared user account) are attributed to a firstpanelist, the impression controller 210 assigns an impression score offour out of five (e.g., 4/5) for the first panelist. In additional oralternative examples, different scales for scoring may be used. Afterdetermining the respective impression scores for the panelists, theimpression controller 210 forwards the respective impression scores andassociated impression percentages to the score management controller214.

In some examples, the example impression controller 210 implementsexample means for managing impressions. The means for managingimpressions is implemented by executable instructions such as thatimplemented by at least blocks 506 and 508 of FIG. 5 . The executableinstructions of blocks 506 and 508 of FIG. 5 may be executed on at leastone processor such as the example processor 712 of FIG. 7 . In otherexamples, the means for managing impressions is implemented by hardwarelogic, hardware implemented state machines, logic circuitry, and/or anyother combination of hardware, software, and/or firmware.

In the illustrated example of FIG. 2 , the demographic controller 212 isimplemented by processor circuitry executing instructions. In additionalor alternative examples, the demographic controller 212 can beimplemented by one or more analog or digital circuit(s), logic circuits,programmable processor(s), programmable controller(s), GPU(s), DSP(s),ASIC(s), PLD(s) and/or FPLD(s). In the example of FIG. 2 , thedemographic controller 212 determines respective gender scores andrespective age scores for the panelists associated with the selecteduser account.

For example, the demographic controller 212 compares the self-declaredgender of the selected user account (indicated in the databaseproprietor impressions data) to respective genders of the panelistsassociated with the selected user account (e.g., as indicated from theAME panel data). Based on the comparison, the demographic controller 212defines respective gender scores for the panelists. In some examples, ifthe self-declared gender of the selected user account matches the genderof a first panelist associated with the selected user account (e.g.,indicated from the AME panel data), the demographic controller 212defines a full gender score (e.g., 5/5) for the first panelist. In someexamples, if the self-declared gender of the selected user account isnot stated or undetermined, the demographic controller 212 defines azero gender score (e.g., 0/5) for the first panelist. In additional oralternative examples, different scales for scoring may be used.

In some examples, the self-declared gender of the selected user accountmay be nonbinary. In such examples, the self-declared gender of theselected user account will not match the gender of any panelist becausethe AME panel data is collected on a binary scale (e.g., male orfemale). In such examples, if the self-declared gender of the selecteduser account is non-binary and does not match the gender of the firstpanelist indicated from the AME panel data, the demographic controller212 assigns a zero gender score (e.g., 0/5) for the first panelist. Inadditional or alternative examples, different scales for scoring may beused.

In some examples, the self-declared gender of the selected user accountmay conflict with the gender of a panelist indicated from the AME paneldata. For example, if the self-declared gender of the selected useraccount is male and the gender of a first panelist associated with theselected user account is female (as indicated from the AME panel data),the demographic controller 212 assigns a negative gender score (e.g.,−5/5) for the first panelist. In additional or alternative examples,different scales for scoring may be used. For example, the gender scorescould be scaled to include only positive and zero values.

In the illustrated example of FIG. 2 , the demographic controller 212compares the self-declared age of the selected user account (indicatedin the database proprietor impressions data) to respective ages of thepanelists associated with the selected user account (e.g., as indicatedfrom the AME panel data). Based on the comparison, the demographiccontroller 212 defines respective age scores for the panelistsassociated with the selected user account. For example, the demographiccontroller 212 determines a difference between the self-declared age ofthe selected user account and the age of a first panelist indicated fromthe AME panel data. In some examples, for smaller differences betweenthe self-declared age of the selected user account and the ages of thepanelists, the demographic controller 212 assigns higher age scores forthe panelists. For example, if the self-declared age of the selecteduser account is 25 and the age of a first panelist is 25 (as indicatedfrom the AME panel data), the demographic controller 212 assigns a fullage score (e.g., 5/5) for the first panelist. After determining therespective gender scores and the respective age scores for the panelistsassociated with the selected user account, the demographic controller212 forwards the respective gender scores and the respective age scoresto the score management controller 214.

In some examples, the example demographic controller 212 implementsexample means for managing demographics. The means for managingdemographics is implemented by executable instructions such as thatimplemented by at least blocks 510, 512, 514, and 516 of FIG. 5 . Theexecutable instructions of blocks 510, 512, 514, and 516 of FIG. 5 maybe executed on at least one processor such as the example processor 712of FIG. 7 . In other examples, the means for managing demographics isimplemented by hardware logic, hardware implemented state machines,logic circuitry, and/or any other combination of hardware, software,and/or firmware.

In the illustrated example of FIG. 2 , the score management controller214 is implemented by processor circuitry executing instructions. Inadditional or alternative examples, the score management controller 214can be implemented by one or more analog or digital circuit(s), logiccircuits, programmable processor(s), programmable controller(s), GPU(s),DSP(s), ASIC(s), PLD(s) and/or FPLD(s). In the example of FIG. 2 , thescore management controller 214 determines respective total scores forthe panelists associated with the selected user account based on therespective impression scores, the respective gender scores, and therespective age scores of the panelists.

For example, to determine the respective total scores for the panelists,the score management controller 214 sums the respective impressionscore, the respective gender score, and the respective age score of thepanelists. The score management controller 214 determines the totalscore for a panelist to be the sum of the score the panelist attained ineach category (e.g., impression score, gender score, and age score) outof the total possible points for the sum of those categories. Thus, if afirst panelist has an impression score of four out of five (e.g., 4/5),a gender score of five out of five (e.g., 5/5), and an age score of twoout of five (e.g., 2/5), the score management controller 214 determinesthe total score for the first panelist to be 11 (e.g., 11=4+5+2) out of15 (15=5+5+5).

Examples disclosed herein use the same scale for impression scores,gender scores, and age score (e.g., a five-point scale, x/5, etc.). Inthis manner, the impression scores, gender scores, and age scores arenormalized which allows for equal weighting for the categories (e.g.,impression score, gender, score, age score). In additional oralternative examples, different scales can be used for each category. Inthis manner, if a category is considered more important, the categorycan be weighed more by increasing the possible score (e.g., a ten-pointscale, x/10, etc.).

In the illustrated example of FIG. 2 , after determining the respectivetotal scores for the panelists associated with the selected useraccount, the score management controller 214 compares the respectivepercentages of impressions of the panelists associated with therespective total scores to a first threshold. For example, the scoremanagement controller 214 compares the respective percentages ofimpression to the first threshold to determine if any total score isbased on a percentage of impressions that satisfies (e.g., exceeds) thefirst threshold. In examples disclosed herein, the first threshold is50%.

In this manner, only the panelist with the highest percentage ofimpressions can be selected as the primary user of the selected useraccount, provided the percent satisfied the first threshold (e.g.,exceeds 50%). In additional or alternative examples, a differentpercentage can be used for the first threshold. If multiple total scoresare based on percentages of impressions that satisfy the firstthreshold, the score management controller 214 selects a highest one ofthe total scores.

In the illustrated example of FIG. 2 , the score management controller214 additionally compares the selected total score to a second thresholdto determine whether the selected total score satisfies (e.g., exceeds)the second threshold. In examples disclosed herein, the second thresholdis configurable and may be tuned as necessary to achieve reliable datathat provides accurate modeling. In response to the score managementcontroller 214 determining that the selected total score does notsatisfy the second threshold, the score management controller 214disregards the selected user account. In some examples, if the scoremanagement controller 214 determines that the selected total score doesnot satisfy the second threshold and the data matching analyzer 128determines that the AME panel data matches similar data provided by thedatabase proprietor 102, the score management controller 214 forwardsthe data associated with the impression identifier (e.g., a CPN) and thecorresponding AME panel data to the network interface 206 to be storedin the user-level covariate data database 204.

In the illustrated example of FIG. 2 , in response to the scoremanagement controller 214 determining that the selected total scoresatisfies the second threshold, the score management controller 214forwards the demographics of the panelist with the selected total scoreand the selected total score to the network interface 206 to be storedin the user-level covariate data database 204. To store the demographicsof the panelist with the selected total score and the selected totalscore in the user-level covariate data database 204, the networkinterface 206 associates the demographics of the panelist with theselected total score and the selected total score with the selected useraccount. In this manner, the select user account is associated with thedemographics and other covariates of the panelist with the selectedtotal score.

In some examples, the example score management controller 214 implementsexample means for managing scores. The means for managing scores isimplemented by executable instructions such as that implemented by atleast block 402 and 404 of FIG. 4 and/or at least blocks 518, 520, 522,524, and 526 of FIG. 5 . The executable instructions of block 402 and404 of FIG. 4 and/or at blocks 518, 520, 522, 524, and 526 of FIG. 5 maybe executed on at least one processor such as the example processor 712of FIG. 7 . In other examples, the means for managing scores isimplemented by hardware logic, hardware implemented state machines,logic circuitry, and/or any other combination of hardware, software,and/or firmware.

After the impression-to-user analyzer 202 corrects falsified and/orotherwise incorrect demographic data for panelists, the model generator140 may use the corrected demographic data to train one or moredemographic correction models to generate predictions and/or inferencesas to the actual demographics (e.g., actual ages) of registered users(e.g., non-panelists) of the database proprietor 102. Returning to FIG.1 . with inferences made to correct for inaccurate demographicinformation of database proprietor users (e.g., falsified self-declaredages) and stored in the model inferences database 146, the AME 104 maybe interested in extracting audience measurement metrics based on thecorrected data. However, as mentioned above, the data contained insidethe privacy-protected cloud environment 106 is subject to privacyconstraints. In some examples, the privacy constraints ensure that thedata can only be extracted for review and/or analysis in aggregate so asto protect the privacy of any particular individual represented in thedata (e.g., a panelist of the AME 104 and/or a registered user of thedatabase proprietor 102). Accordingly, in some examples, a dataaggregator 148 aggregates the audience measurement data associated withparticular media campaigns before the data is provided to an aggregatedcampaign data database 150 in the AME output data store 138 of the AMEproprietary cloud environment 126.

The data aggregator 148 may aggregate data in different ways fordifferent types of audience measurement metrics. For instance, at thehighest level, the aggregated data may provide the total impressioncount and total number of registered users (e.g., estimated audiencesize) exposed to the media 108 for a particular media campaign. Asmentioned above, the total number of registered users reported by thedata aggregator 148 is based on the total number of unique user accountsmatched to impressions but does not include the individuals associatedwith impressions that were not matched to a particular registered user(e.g., non-coverage). However, the total number of unique user accountsdoes not account for the fact that a single individual may correspond tomore than one user account (e.g., multi-account users), and does notaccount for situations where a person other than a registered user wasexposed to the media 108 (e.g., misattribution). These errors in theaggregated data may be corrected based on the adjustment factors storedin the adjustment factors database 136. Further, in some examples, theaggregated data may include an indication of the demographic compositionof the registered users represented in the aggregated data (e.g., numberof males vs females, number of registered users in different agebrackets, etc.).

Additionally or alternatively, in some examples, the data aggregator 148may provide aggregated data that is associated with a particular aspectof a media campaign. For instance, the data may be aggregated based onparticular sites (e.g., all media impressions served on YouTube.com). Inother examples, the data may be aggregated based on placementinformation (e.g., aggregated based on particular primary content videosaccessed by users when the media advertisement was served). In otherexamples, the data may be aggregated based on device type (e.g.,impressions served via a desktop computer versus impressions served viaa mobile device). In other examples, the data may be aggregated based ona combination of one or more of the above factors and/or based on anyother relevant factor(s).

In some examples, the privacy constraints imposed on the data within theprivacy-protected cloud environment 106 include a limitation that datacannot be extracted (even when aggregated) for less than a thresholdnumber of individuals (e.g., 50 individuals). Accordingly, if theparticular metric being sought includes less than the threshold numberof individuals, the data aggregator 148 will not provide such data. Forinstance, if the threshold number of individuals is 50 but there areonly 46 females in the age range of 18-25 that were exposed toparticular media 108, the data aggregator 148 would not provide theaggregate data for females in the 18-25 age bracket. Such privacyconstraints can leave gaps in the audience measurement metrics,particularly in locations where the number of panelists is relativelysmall. Accordingly, in some examples, when audience measurement is notavailable for a particular demographic segment of interest in aparticular region (e.g., a particular country), the audience measurementmetrics in one or more comparable region(s) may be used to impute themetrics for the missing data in the first region of interest. In someexamples, the particular metrics imputed from comparable regions isbased on a comparison of audience metrics for which data is available inboth regions. For instance, while data for females in the 18-25 bracketmay be unavailable, assume that data for females in the 26-35 agebracket is available. The metrics associated with the 26-35 age bracketin the region of interests may be compared with metrics for the 26-35age bracket in other regions and the regions with the closest metrics tothe region of interest may be selected for use in calculating imputationfactor(s).

As shown in the illustrated example, both the adjustment factorsdatabase 136 and the aggregated campaigns data database 150 are includedwithin the AME output data store 138 of the AME proprietary cloudenvironment 126. As mentioned above, in some examples, the AMEproprietary cloud environment 126 is provided by the database proprietor102 and enables data to be provided to and retrieved from theprivacy-protected cloud environment. In some examples, the aggregatedcampaign data and the adjustment factors are subsequently transferred toa separate computing apparatus 152 of the AME 104 for analysis by anaudience metrics analyzer 154. In some examples, the separate computingapparatus may be omitted with its functionality provided by the AMEproprietary cloud environment 126. In other examples, the AMEproprietary cloud environment 126 may be omitted with the adjustmentfactors and the aggregated data provided directly to the computingapparatus 152. Further, in this example, the AME panel data database 122is within the AME first party data store 124, which is shown as beingseparate from the AME output data store 138. However, in other examples,the AME first party data store 124 and the AME output data store 138 maybe combined.

In the illustrated example of FIG. 1 , the audience metrics analyzer 154applies the adjustment factors to the aggregated data to correct forerrors in the data including misattribution, non-coverage, andmulti-count users. The output of the audience metrics analyzer 154corresponds to the final calibrated data of the AME 104 and is stored ina final calibrated data database 156. In this example, the computingapparatus 152 also includes a report generator 158 to generate reportsbased on the final calibrated data.

While an example manner of implementing the privacy-protected cloudenvironment 106 of FIG. 1 is illustrated in FIG. 1 , one or more of theelements, processes and/or devices illustrated in FIG. 1 may becombined, divided, re-arranged, omitted, eliminated and/or implementedin any other way. Further, the example campaign impressions database116, example matchable impressions database 118, the example enrichedcampaign impressions database 120, the example data matching analyzer128, the example AME intermediary merged data database 130, the exampleAME privacy-protected data store 132, the example adjustment factoranalyzer 134, the example impression-to-user analyzer 202, the exampleuser-level covariate database 204, the example network interface 206,the example user account controller 208, the example impressioncontroller 210, the example demographic controller 212, the examplescore management controller 214, the example model generator 140, theexample demographic correction models database 142, the example modelanalyzer 144, the example model inferences database 146, the exampledata aggregator 148, and/or, more generally, the exampleprivacy-protected cloud environment 106 of FIG. 1 and/or FIG. 2 may beimplemented by hardware, software, firmware and/or any combination ofhardware, software and/or firmware. Thus, for example, any of theexample campaign impressions database 116, example matchable impressionsdatabase 118, the example enriched campaign impressions database 120,the example data matching analyzer 128, the example AME intermediarymerged data database 130, the example AME privacy-protected data store132, the example adjustment factor analyzer 134, the exampleimpression-to-user analyzer 202, the example user-level covariatedatabase 204, the example network interface 206, the example useraccount controller 208, the example impression controller 210, theexample demographic controller 212, the example score managementcontroller 214, the example model generator 140, the example demographiccorrection models database 142, the example model analyzer 144, theexample model inferences database 146, the example data aggregator 148,and/or, more generally, the example privacy-protected cloud environment106 of FIG. 1 and/or FIG. 2 could be implemented by one or more analogor digital circuit(s), logic circuits, programmable processor(s),programmable controller(s), graphics processing unit(s) (GPU(s)),digital signal processor(s) (DSP(s)), application specific integratedcircuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or fieldprogrammable logic device(s) (FPLD(s)). When reading any of theapparatus or system claims of this patent to cover a purely softwareand/or firmware implementation, at least one of the example campaignimpressions database 116, example matchable impressions database 118,the example enriched campaign impressions database 120, the example datamatching analyzer 128, the example AME intermediary merged data database130, the example AME privacy-protected data store 132, the exampleadjustment factor analyzer 134, the example impression-to-user analyzer202, the example user-level covariate database 204, the example networkinterface 206, the example user account controller 208, the exampleimpression controller 210, the example demographic controller 212, theexample score management controller 214, the example model generator140, the example demographic correction models database 142, the examplemodel analyzer 144, the example model inferences database 146, theexample data aggregator 148 is/are hereby expressly defined to include anon-transitory computer readable storage device or storage disk such asa memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-raydisk, etc. including the software and/or firmware. Further still, theexample privacy-protected cloud environment 106 of FIG. 1 may includeone or more elements, processes and/or devices in addition to, orinstead of, those illustrated in FIG. 1 and/or FIG. 2 , and/or mayinclude more than one of any or all of the illustrated elements,processes and devices. As used herein, the phrase “in communication,”including variations thereof, encompasses direct communication and/orindirect communication through one or more intermediary components, anddoes not require direct physical (e.g., wired) communication and/orconstant communication, but rather additionally includes selectivecommunication at periodic intervals, scheduled intervals, aperiodicintervals, and/or one-time events.

Flowchart representative of example hardware logic, machine readableinstructions, hardware implemented state machines, and/or anycombination thereof for implementing aspects of the privacy-protectedcloud environment 106 of FIG. 1 are shown in FIGS. 3, 4 , and/or 5. Themachine readable instructions may be one or more executable programs orportion(s) of an executable program for execution by a computerprocessor and/or processor circuitry, such as the processor 712 shown inthe example processor platform 700 discussed below in connection withFIG. 7 . The program may be embodied in software stored on anon-transitory computer readable storage medium such as a CD-ROM, afloppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associatedwith the processor 712, but the entire program and/or parts thereofcould alternatively be executed by a device other than the processor 712and/or embodied in firmware or dedicated hardware. Further, although theexample program is described with reference to the flowchartsillustrated in FIGS. 3, 4 , and/or 5, many other methods of implementingthe example privacy-protected cloud environment 106 may alternatively beused. For example, the order of execution of the blocks may be changed,and/or some of the blocks described may be changed, eliminated, orcombined. Additionally or alternatively, any or all of the blocks may beimplemented by one or more hardware circuits (e.g., discrete and/orintegrated analog and/or digital circuitry, an FPGA, an ASIC, acomparator, an operational-amplifier (op-amp), a logic circuit, etc.)structured to perform the corresponding operation without executingsoftware or firmware. The processor circuitry may be distributed indifferent network locations and/or local to one or more devices (e.g., amulti-core processor in a single machine, multiple processorsdistributed across a server rack, etc.).

The machine readable instructions described herein may be stored in oneor more of a compressed format, an encrypted format, a fragmentedformat, a compiled format, an executable format, a packaged format, etc.Machine readable instructions as described herein may be stored as dataor a data structure (e.g., portions of instructions, code,representations of code, etc.) that may be utilized to create,manufacture, and/or produce machine executable instructions. Forexample, the machine readable instructions may be fragmented and storedon one or more storage devices and/or computing devices (e.g., servers)located at the same or different locations of a network or collection ofnetworks (e.g., in the cloud, in edge devices, etc.). The machinereadable instructions may require one or more of installation,modification, adaptation, updating, combining, supplementing,configuring, decryption, decompression, unpacking, distribution,reassignment, compilation, etc. in order to make them directly readable,interpretable, and/or executable by a computing device and/or othermachine. For example, the machine readable instructions may be stored inmultiple parts, which are individually compressed, encrypted, and storedon separate computing devices, wherein the parts when decrypted,decompressed, and combined form a set of executable instructions thatimplement one or more functions that may together form a program such asthat described herein.

In another example, the machine readable instructions may be stored in astate in which they may be read by processor circuitry, but requireaddition of a library (e.g., a dynamic link library (DLL)), a softwaredevelopment kit (SDK), an application programming interface (API), etc.in order to execute the instructions on a particular computing device orother device. In another example, the machine readable instructions mayneed to be configured (e.g., settings stored, data input, networkaddresses recorded, etc.) before the machine readable instructionsand/or the corresponding program(s) can be executed in whole or in part.Thus, machine readable media, as used herein, may include machinereadable instructions and/or program(s) regardless of the particularformat or state of the machine readable instructions and/or program(s)when stored or otherwise at rest or in transit.

The machine readable instructions described herein can be represented byany past, present, or future instruction language, scripting language,programming language, etc. For example, the machine readableinstructions may be represented using any of the following languages: C,C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language(HTML), Structured Query Language (SQL), Swift, etc.

As mentioned above, the example processes of FIGS. 3, 4 , and/or 5 maybe implemented using executable instructions (e.g., computer and/ormachine readable instructions) stored on a non-transitory computerand/or machine readable medium such as a hard disk drive, a flashmemory, a read-only memory, a compact disk, a digital versatile disk, acache, a random-access memory and/or any other storage device or storagedisk in which information is stored for any duration (e.g., for extendedtime periods, permanently, for brief instances, for temporarilybuffering, and/or for caching of the information). As used herein, theterm non-transitory computer readable medium is expressly defined toinclude any type of computer readable storage device and/or storage diskand to exclude propagating signals and to exclude transmission media.

“Including” and “comprising” (and all forms and tenses thereof) are usedherein to be open ended terms. Thus, whenever a claim employs any formof “include” or “comprise” (e.g., comprises, includes, comprising,including, having, etc.) as a preamble or within a claim recitation ofany kind, it is to be understood that additional elements, terms, etc.may be present without falling outside the scope of the correspondingclaim or recitation. As used herein, when the phrase “at least” is usedas the transition term in, for example, a preamble of a claim, it isopen-ended in the same manner as the term “comprising” and “including”are open ended. The term “and/or” when used, for example, in a form suchas A, B, and/or C refers to any combination or subset of A, B, C such as(1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) Bwith C, and (7) A with B and with C. As used herein in the context ofdescribing structures, components, items, objects and/or things, thephrase “at least one of A and B” is intended to refer to implementationsincluding any of (1) at least one A, (2) at least one B, and (3) atleast one A and at least one B. Similarly, as used herein in the contextof describing structures, components, items, objects and/or things, thephrase “at least one of A or B” is intended to refer to implementationsincluding any of (1) at least one A, (2) at least one B, and (3) atleast one A and at least one B. As used herein in the context ofdescribing the performance or execution of processes, instructions,actions, activities and/or steps, the phrase “at least one of A and B”is intended to refer to implementations including any of (1) at leastone A, (2) at least one B, and (3) at least one A and at least one B.Similarly, as used herein in the context of describing the performanceor execution of processes, instructions, actions, activities and/orsteps, the phrase “at least one of A or B” is intended to refer toimplementations including any of (1) at least one A, (2) at least one B,and (3) at least one A and at least one B.

As used herein, singular references (e.g., “a”, “an”, “first”, “second”,etc.) do not exclude a plurality. The term “a” or “an” item, as usedherein, refers to one or more of that item. The terms “a” (or “an”),“one or more”, and “at least one” can be used interchangeably herein.Furthermore, although individually listed, a plurality of means,elements or method actions may be implemented by, e.g., a single unit orprocessor. Additionally, although individual features may be included indifferent examples or claims, these may possibly be combined, and theinclusion in different examples or claims does not imply that acombination of features is not feasible and/or advantageous.

FIG. 3 is a flowchart representative of machine readable instructions300 which may be executed to implement the privacy-protected cloudenvironment 106 of FIG. 2 . The privacy-protected cloud environment 106executes the machine readable instructions 300 to develop training datawith which to train one or more machine learning models. For example, byusing the AME panel data, the privacy-protected cloud environment 106can identify training data relating demographic data of panelist primaryusers to demographic data of panelist registered users. In this manner,the privacy-protected cloud environment 106 may use the training data totrain one or more machine learning models to correct demographic datafor non-panelist registered users.

In the illustrated example of FIG. 3 , the machine readable instructions300 begin at block 302 where the impression-to-user analyzer 202determines primary users from among multiple potential users of panelistuser accounts of the database proprietor 102. Detailed machine readableinstructions to determine primary users from among multiple potentialusers of panelist user accounts of the database proprietor 102 areillustrated and described in connection with FIG. 4 . Additional oralternative detailed machine readable instructions to determine primaryusers from among multiple potential users of panelist user accounts ofthe database proprietor 102 are illustrated and described in connectionwith FIG. 5 .

In the illustrated example of FIG. 3 , at block 304, the model generator140 generates a demographic correction model based on the identifiedprimary users. For example, the model generator 140 executes a trainingalgorithm (e.g., stochastic gradient descent) to train a machinelearning model to correct demographics (e.g., self-reported age) ofregistered users of the database proprietor 102 using the demographicsof the identified primary users as training data. In this manner, themodel analyzer 144 may execute one or more trained demographiccorrection models to correct demographics of non-panelist registeredusers of the database proprietor 102.

FIG. 4 is a flowchart representative of machine readable instructions302 which may be executed to implement the example impression-to-useranalyzer 202 of FIG. 2 to determine primary users from among multiplepotential users of panelist user accounts of the database proprietor102. For example, multiple panelists of a panelist household may utilizethe same user account to view media including advertisements. In someexamples, processor circuitry executes the machine readable instructions302 to implement the impression-to-user analyzer 202 to determineprimary users from among multiple potential users of panelists useraccounts of the database proprietor 102.

In the illustrated example of FIG. 4 , the machine readable instructions302 begin at block 402 where the score management controller 214determines a first total score for a first panelist associated with apanelist user account. At block 404, the score management controller 214determines a second total score for a second panelist associated withthe panelist user account. In the example of FIG. 4 , the respectivetotal scores are based respective impression scores, respective genderscores, and respective age scores of the two panelists.

In the illustrated example of FIG. 4 , at block 406, in response to thescore management controller 214 determining that the first total scoresatisfies (e.g., exceeds) a threshold, the network interface 206 storesdemographics of the first panelist for the panelist user account (e.g.,the network interface 206 is configured to store demographics of thefirst panelist for the panelist user account). In the example of FIG. 4, the threshold corresponds to a configurable value that may be tuned asnecessary to achieve reliable data that provides accurate modeling. Inthis manner, the impression-to-user analyzer 202 determines the primaryuser of the panelist user account of the database proprietor 102 andadjusts demographic information of the panelist user account to reflectthe primary user of the panelist user account.

FIG. 5 is another flowchart representative of machine readableinstructions 302 which may be executed to implement the exampleimpression-to-user analyzer 202 of FIG. 2 to determine primary users ofpanelist user accounts of the database proprietor 102. The machinereadable instructions 302 begin at block 502 where the network interface206 obtains the AME intermediary merged data from the AME intermediarymerged data database 130.

In the illustrated example of FIG. 5 , at block 504, the user accountcontroller 208 selects a user account. At block 506, the impressioncontroller 210 determines respective percentages of impressions for theselected panelist user account that are attributed to the panelistsassociated with the panelist user account. For example, multiplepanelists may be associated with a single user account (e.g., multiplepanelists in the same panelist household) such that the user account maybe a shared user account where the registered user is not the same asthe primary user. In some examples, the impression controller 210determines the number of impressions logged for each user accountregistered with the database proprietor 102.

In the illustrated example of FIG. 5 , to determine the number ofimpressions logged for each panelist associated with the user account,the impression controller 210 determines whether a panelist was served aprompt to self-identify and, if so, the panelists response to thatprompt. For example, the prompt may also ask the panelist to identifywhich other individuals are present at the time of the prompt. Based onthe total number of impressions logged for the selected user account andthe number of impressions logged for each panelist of the user account,the impression controller 210 determines the respective percentages ofimpressions for the user account that are attributed to the panelists.

In the illustrated example of FIG. 5 , at block 508, the impressioncontroller 210 defines respective impression scores for the panelistsassociated with the user account. In some examples, the higher thepercentage determined at block 508, the higher the impression score. Forexample, the impression controller 210 determines the impression scorefor each panelist based on the percentage of impression attributed toeach panelist. In some examples, if between 60% and 80% of theimpressions associated with the selected user account are attributed toa first panelist, the impression controller 210 assigns an impressionscore of four out of five (e.g., 4/5) for the first panelist.

In the illustrated example of FIG. 5 , at block 510, the demographiccontroller 212 compares a self-declared gender of the selected useraccount (indicated in the database proprietor impressions data) to therespective genders of the panelists as indicated from the AME paneldata. At block 512, the demographic controller 212 defines respectivegender scores for the panelists associated with the user account. Insome examples, the gender score is assigned based on whether theself-declared gender of the selected user account matches the AME paneldata, does not match the panel, or is undetermined, unknown (e.g., whenthe self-declared gender is not provided), or non-binary. In someexamples, a higher gender score is assigned when the self-declaredgender matches the AME panel data.

In the illustrated example of FIG. 5 , at block 514, the demographiccontroller 212 compares the self-declared age of the selected useraccount (indicated in the database proprietor impressions data) to therespective ages of the panelists as indicated from the AME panel data.At block 516, the demographic controller 212 defines respective agescores for the panelists associated with the user account. In someexamples, as the difference between the self-declared age and the ageindicated by the AME panel data decreases, the age score increases.

In the illustrated example of FIG. 5 , at block 518, the scoremanagement controller 214 determines respective total scores for thepanelists based on the respective impression scores, the respectivegender scores, and the respective age scores of the panelists. At block520, the score management controller 214 determines whether any of therespective total scores is based on a percentage of impression thatsatisfies a first threshold. For example, the score managementcontroller 214 determines whether any of the respective total scores isbased on a percentage of impressions that exceeds a first threshold of50%.

In the illustrated example of FIG. 5 , in response to the scoremanagement controller 214 determining that none of the respective totalscores are based on a percentage of impressions that satisfies the firstthreshold (block 520: NO), the machine readable instructions 302 proceedto block 526. In response to the score management controller 214determining that at least one of the respective total scores is based ona percentage of impressions that satisfies the first threshold (block520: YES), the machine readable instructions 302 proceed to block 522.At block 522, the score management controller 214 selects a highest oneof the total scores that satisfy the first threshold. In examplesdisclosed herein, the first threshold is 50%, but in some examples otherpercentages may be used (e.g., 30%, 40%, etc.). Thus, in such cases, thescore management controller 214 selects the highest one of the totalscores that satisfy the first threshold.

In the illustrated example of FIG. 5 , at block 524, the scoremanagement controller 214 determines whether the selected total scoresatisfies a second threshold. In examples disclosed herein, the selectedtotal score satisfying the second threshold indicates that the selectedtotal score is high enough to achieve reliable data that providesaccurate modeling. In response to the score management controller 214determining that the selected total score does not satisfy the secondthreshold (block 524: NO), the machine readable instructions 302 proceedto block 526. In response to the score management controller 214determining that the selected total score satisfies the second threshold(block 524: YES), the machine readable instructions 302 proceed to block528.

In the illustrated example of FIG. 5 , at block 526, the scoremanagement controller 214 disregards the selected user account. At block528, the network interface 206 stores demographics of the panelist withthe selected total score for the user account (e.g., in the user-levelcovariate data database 204). At block 530, the user account controller208 determines whether there is another panelist user account to beanalyzed in the AME intermediary merged data. In response to the useraccount controller 208 determining that there is another panelist useraccount to be analyzed in the AME intermediary merged data (block 530:YES), the machine readable instructions 302 proceed to block 532. Atblock 532, the user account controller 208 selects a next user account.Thereafter, the machine readable instructions 302 return to block 506.In response to the user account controller 208 determining that there isnot another panelist user account to be analyzed in the AME intermediarymerged data (block 530: NO), the machine readable instructions 302return to the machine readable instructions 300 at block 304.

FIG. 6 illustrates the example system 100 of FIG. 1 . However, someaspects of FIG. 1 are omitted from FIG. 6 for the sake of clarity todiscuss the process by which the AME is able to correct formisattribution within panel data in situations where the actual user ofthe panelist client devices 112 are not known (e.g., the user was notprompted to self-identify). Users are typically not prompted toself-identify when they are using mobile devices. Accordingly, theillustrated example is described with respect to mobile devices.However, teachings disclosed herein to correct misattribution of paneldata may be applied to any type of panelist client device 112. Asrepresented in the illustrated example of FIG. 6 , the AME 104 performsone or more surveys of a subset of panelists that use mobile devices toelicit survey responses 602. In some examples, the surveys may beelectronically administered (e.g., via the panelist client devices 112and/or other computing device) to ask the panelists about their behaviorand usage of the panelist client devices 112 through which the AME 104collects the AME panel data. In some examples, the survey is designed tocollect survey responses providing information about whether otherpeople besides the panelist use the panelist client device; if so, howoften each person uses the device; when (e.g., day part, time of day,time of week, etc.) each person uses the device; the types and/orcategories of website and/or applications used and/or visited by eachuser of the device; the type, genre, and/or characteristics of videosand/or other content accessed by each user of the device, and so forth.As shown in the illustrated example, the survey response 602 arecollected by an AME panel data collection system 604 and stored in asurvey data database 606. The survey data may include additionalinformation the AME 104 already knows about the panelist such as who theprimary user of the device is, the type of the device (e.g., tabletversus smartphone), the demographic composition (e.g., ages, genders,etc.) of all individuals in the panelist household, and/or any otherrelevant information. In this example, the AME panel data collectionsystem 604 is a separate system from the computing apparatus 152 shownin FIG. 1 and omitted in FIG. 6 for the sake of clarity. However, inother examples, the AME panel data collection system 604 may correspondto and/or include the computing apparatus 152.

In some examples, the survey data serves as truth data that is used byan individualization model generator 608 to train and validate a model(referred to herein as an individualization model) that can predict thetrue user of a mobile device (when such information is unavailable)based on information that the AME 104 does know for certain and/or hasaccess to through the collection of panel data via the meter 115. Oncethe individualization model has been trained and validated based on theAME panel data, the model is stored in an individualization modelsdatabase 610. In some examples, an individualization model analyzer 612may implement the individualization model to generate predictions and/orinferences as to the actual users associated with media impressionscaptured by the audience measurement meters 115 of associated mobilepanelist client devices 112. That is, in some examples, as shown in FIG.1 , the individualization model analyzer 612 uses the individualizationmodel to analyze the usage behavior (at the time of particular mediaimpressions) reported from a meter 115 on a particular panelist clientdevice 112. This analysis is performed in conjunction with knowninformation about the demographic composition of the household of thepanelist(s) associated with the device to predict the actual user of thepanelist client device 112 at the time of the media impression ratherthan automatically associating the impression with the primary user ofthe panelist client device 112.

In the illustrated example of FIG. 6 , the output of theindividualization model analyzer 612 is passed to a device-levelindividualized data database 614 in the AME first party data store 124to enable the data to be used to generate an impression-levelindividualized data database 616 within the AME privacy-protected datastore 132 of the privacy-protected cloud environment 106. In someexamples, the output of the individualization model analyzer 612 may bestored locally by the AME panel data collection system 604 before thedata is provided to the device-level individualized data database 614.

As represented in FIG. 6 , the information in the impression-levelindividualized data database 616 results from the combination of thedevice-level individuals data (in the device-level individualized datadatabase 614) and the AME intermediary merged data (in the AMEintermediary merged data database 130). That is, the actual user for aparticular impression (determined from the device-level individualizeddata) is matched with the AME panel data that has been enriched by thecovariates provided by the database proprietor 102.

In some examples, only the predictions for the feature combinationsanalyzed by the individualization model analyzer 612 (e.g., daypart,genre, category, device type, etc.) that match the feature combinations(e.g., covariates) provided by the database proprietor 102 are retainedin the impression-level individualized data database 616. For example,assume that daypart is the only feature used in the individualizationmodel. Further assume that for a particular panelist client device 112,the individualization model predicts that person A is the actual user ofthe device when the daypart=morning, person B is the actual user of thedevice when the daypart=midday, and person C is the actual user of thedevice when the daypart=evening. All of these predictions are storedwithin the device-level individuals data database 614 in connection withthe particular panelist client device 112. Now assume that the databaseproprietor impressions data indicates that a particular impression thatwas logged for the particular panelist client device 112 occurred whendaypart=midday. In such a situation, person B would be assigned as theactual user for that particular impression in the impression-levelindividualized data database 616 and the predictions for the morning andevening dayparts would not be used (at least for that particularimpression). Once the AME intermediary merged data has been corrected inthis manner, the process to calculate adjustment factors and performother analyses as disclosed herein proceeds in a similar manner asoutlined above.

FIG. 7 is a block diagram of an example processor platform 700structured to execute the instructions of FIGS. 3, 4 , and/or 5 toimplement the privacy-protected cloud environment 106 and/or theimpression-to-user analyzer 202 of FIG. 2 . The processor platform 700can be, for example, a server, a personal computer, a workstation, aself-learning machine (e.g., a neural network), or any other type ofcomputing device.

The processor platform 700 of the illustrated example includes aprocessor 712. The processor 712 of the illustrated example is hardware.For example, the processor 712 can be implemented by one or moreintegrated circuits, logic circuits, microprocessors, GPUs, DSPs, orcontrollers from any desired family or manufacturer. The hardwareprocessor 712 may be a semiconductor based (e.g., silicon based) device.In this example, the processor 712 implements the example networkinterface 206, the example user account controller 208, the exampleimpression controller 210, the example demographic controller 212, theexample score management controller 214, and the model generator 140.

The processor 712 of the illustrated example includes a local memory 713(e.g., a cache). The processor 712 of the illustrated example is incommunication with a main memory including a volatile memory 714 and anon-volatile memory 716 via a bus 718. The volatile memory 714 may beimplemented by Synchronous Dynamic Random Access Memory (SDRAM), DynamicRandom Access Memory (DRAM), RAIVIBUS® Dynamic Random Access Memory(RDRAM®) and/or any other type of random access memory device. Thenon-volatile memory 716 may be implemented by flash memory and/or anyother desired type of memory device. Access to the main memory 714, 716is controlled by a memory controller.

The processor platform 700 of the illustrated example also includes aninterface circuit 720. The interface circuit 720 may be implemented byany type of interface standard, such as an Ethernet interface, auniversal serial bus (USB), a Bluetooth® interface, a near fieldcommunication (NFC) interface, and/or a PCI express interface.

In the illustrated example, one or more input devices 722 are connectedto the interface circuit 720. The input device(s) 722 permit(s) a userto enter data and/or commands into the processor 712. The inputdevice(s) can be implemented by, for example, an audio sensor, amicrophone, a camera (still or video), a keyboard, a button, a mouse, atouchscreen, a track-pad, a trackball, isopoint and/or a voicerecognition system.

One or more output devices 724 are also connected to the interfacecircuit 720 of the illustrated example. The output devices 724 can beimplemented, for example, by display devices (e.g., a light emittingdiode (LED), an organic light emitting diode (OLED), a liquid crystaldisplay (LCD), a cathode ray tube display (CRT), an in-place switching(IPS) display, a touchscreen, etc.), a tactile output device, a printerand/or speaker. The interface circuit 720 of the illustrated example,thus, typically includes a graphics driver card, a graphics driver chipand/or a graphics driver processor.

The interface circuit 720 of the illustrated example also includes acommunication device such as a transmitter, a receiver, a transceiver, amodem, a residential gateway, a wireless access point, and/or a networkinterface to facilitate exchange of data with external machines (e.g.,computing devices of any kind) via a network 726. The communication canbe via, for example, an Ethernet connection, a digital subscriber line(DSL) connection, a telephone line connection, a coaxial cable system, asatellite system, a line-of-site wireless system, a cellular telephonesystem, etc.

The processor platform 700 of the illustrated example also includes oneor more mass storage devices 728 for storing software and/or data.Examples of such mass storage devices 728 include floppy disk drives,hard drive disks, compact disk drives, Blu-ray disk drives, redundantarray of independent disks (RAID) systems, and digital versatile disk(DVD) drives.

The machine executable instructions 732 of FIGS. 3, 4 , and/or 5 may bestored in the mass storage device 728, in the volatile memory 714, inthe non-volatile memory 716, and/or on a removable non-transitorycomputer readable storage medium such as a CD or DVD.

From the foregoing, it will be appreciated that example methods,apparatus, and articles of manufacture have been disclosed that enablethe generation of accurate and reliable audience measurement metrics forInternet-based media without the use of third-party cookies and/or tagsthat have been the standard approach for monitoring Internet media formany years. This is accomplished by merging AME panel data with databaseproprietor impressions data within a privacy-protected cloud basedenvironment. The nature of the cloud environment and the privacyconstraints imposed thereon as well as the nature in which the databaseproprietor collects the database proprietor impression data presenttechnological challenges contributing to limitations in the reliabilityand/or completeness of the data.

However, examples disclosed herein overcome these difficulties bygenerating adjustment factors and/or machine learning models based onthe AME panel data. The disclosed examples correct for misreporteddemographic data associated with a user account by assessing differencesbetween demographic data provided by databases proprietors and similardemographic data provided by an AME. For example, the disclosed methods,apparatus, and articles of manufacture correct computer errors inaudience metrics generated as a result of registered users reportingincorrect demographics. Registered users can purposefully and/orinadvertently report incorrect demographic information due to the natureand medium of reporting being electronic (e.g., via a computer and/orthe Internet). The disclosed methods, apparatus, and articles ofmanufacture correct for this error by establishing a reliable trainingdataset relating panelist primary users to panelist registered users.Having established a reliable training dataset, examples disclosedherein train one or more machine learning models to correct demographicinformation for non-panelist registered users.

Example methods, apparatus, systems, and articles of manufacture toadjust demographic information of user accounts to reflect primary usersof the user accounts are disclosed herein. Further examples andcombinations thereof include the following:

Example 1 includes an apparatus comprising memory, and processorcircuitry to execute instructions that cause the processor circuitry toat least determine a first total score for a first panelist associatedwith a panelist user account based on at least one of a first impressionscore, a first age score, or a first gender score, determine a secondtotal score for a second panelist associated with the panelist useraccount based on at least one of a second impression score, a second agescore, or a second gender score, and in response to determining that thefirst total score satisfies a threshold, store demographics of the firstpanelist for the panelist user account.

Example 2 includes the apparatus of example 1, wherein the processorcircuitry is to determine a first percentage of impressions for thepanelist user account that are attributed to the first panelist,determine a second percentage of impressions for the panelist useraccount that are attributed to the second panelist, define the firstimpression score for the first panelist based on the first percentage ofimpressions, and define the second impression score for the secondpanelist based on the second percentage of impressions.

Example 3 includes the apparatus of example 1, wherein the processorcircuitry is to compare a self-declared gender of the panelist useraccount to a first gender of the first panelist and a second gender ofthe second panelist, define the first gender score for the firstpanelist based on the comparison, and define the second gender score forthe second panelist based on the comparison.

Example 4 includes the apparatus of example 1, wherein the processorcircuitry is to compare a self-declared age of the panelist user accountto a first age of the first panelist and a second age of the secondpanelist, define the first age score for the first panelist based on thecomparison, and define the second age score for the second panelistbased on the comparison.

Example 5 includes the apparatus of example 1, wherein the processorcircuitry is to determine whether the first total score satisfies thethreshold.

Example 6 includes the apparatus of example 1, wherein the processorcircuitry is to, in response to determining that the first total scoredoes not satisfy the threshold, disregard the panelist user account.

Example 7 includes the apparatus of example 1, wherein the threshold isa first threshold and the processor circuitry is to determine whether afirst percentage of impressions for the panelist user account that areattributed to the first panelist and on which the first total score isbased satisfies a second threshold, and in response to the first totalscore satisfying the first threshold and the first percentage ofimpressions satisfying the second threshold, store the demographics ofthe first panelist for the panelist user account.

Example 8 includes the apparatus of example 1, wherein the processorcircuitry is to generate one or more demographic correction models basedon the demographics of the first panelist.

Example 9 includes an apparatus comprising a score management controllerto determine a first total score for a first panelist associated with apanelist user account based on at least one of a first impression score,a first age score, or a first gender score, and determine a second totalscore for a second panelist associated with the panelist user accountbased on at least one of a second impression score, a second age score,or a second gender score, and a network interface to, in response todetermining that the first total score satisfies a threshold, storedemographics of the first panelist for the panelist user account.

Example 10 includes the apparatus of example 9, further including animpression controller to determine a first percentage of impressions forthe panelist user account that are attributed to the first panelist,determine a second percentage of impressions for the panelist useraccount that are attributed to the second panelist, define the firstimpression score for the first panelist based on the first percentage ofimpressions, and define the second impression score for the secondpanelist based on the second percentage of impressions.

Example 11 includes the apparatus of example 9, further including ademographic controller to compare a self-declared gender of the panelistuser account to a first gender of the first panelist and a second genderof the second panelist, define the first gender score for the firstpanelist based on the comparison, and define the second gender score forthe second panelist based on the comparison.

Example 12 includes the apparatus of example 9, further including ademographic controller to compare a self-declared age of the panelistuser account to a first age of the first panelist and a second age ofthe second panelist, define the first age score for the first panelistbased on the comparison, and define the second age score for the secondpanelist based on the comparison.

Example 13 includes the apparatus of example 9, wherein the scoremanagement controller is to determine whether the first total scoresatisfies the threshold.

Example 14 includes the apparatus of example 9, wherein the scoremanagement controller is to, in response to determining that the firsttotal score does not satisfy the threshold, disregard the panelist useraccount.

Example 15 includes the apparatus of example 9, wherein the threshold isa first threshold, the score management controller is to determinewhether a first percentage of impressions for the panelist user accountthat are attributed to the first panelist and on which the first totalscore is based satisfies a second threshold, and the network interfaceis to, in response to the first total score satisfying the firstthreshold and the first percentage of impressions satisfying the secondthreshold, store the demographics of the first panelist for the panelistuser account.

Example 16 includes the apparatus of example 9, further including amodel generator to generate one or more demographic correction modelsbased on the demographics of the first panelist.

Example 17 includes a non-transitory computer readable storage mediumcomprising instruction which, when executed, cause at least oneprocessor to at least determine a first total score for a first panelistassociated with a panelist user account based on at least one of a firstimpression score, a first age score, or a first gender score, determinea second total score for a second panelist associated with the panelistuser account based on at least one of a second impression score, asecond age score, or a second gender score, and in response todetermining that the first total score satisfies a threshold, storedemographics of the first panelist for the panelist user account.

Example 18 includes the non-transitory computer readable storage mediumof example 17, wherein the instructions, when executed, cause the atleast one processor to determine a first percentage of impressions forthe panelist user account that are attributed to the first panelist,determine a second percentage of impressions for the panelist useraccount that are attributed to the second panelist, define the firstimpression score for the first panelist based on the first percentage ofimpressions, and define the second impression score for the secondpanelist based on the second percentage of impressions.

Example 19 includes the non-transitory computer readable storage mediumof example 17, wherein the instructions, when executed, cause the atleast one processor to compare a self-declared gender of the panelistuser account to a first gender of the first panelist and a second genderof the second panelist, define the first gender score for the firstpanelist based on the comparison, and define the second gender score forthe second panelist based on the comparison.

Example 20 includes the non-transitory computer readable storage mediumof example 17, wherein the instructions, when executed, cause the atleast one processor to compare a self-declared age of the panelist useraccount to a first age of the first panelist and a second age of thesecond panelist, define the first age score for the first panelist basedon the comparison, and define the second age score for the secondpanelist based on the comparison.

Example 21 includes the non-transitory computer readable storage mediumof example 17, wherein the instructions, when executed, cause the atleast one processor to determine whether the first total score satisfiesthe threshold.

Example 22 includes the non-transitory computer readable storage mediumof example 17, wherein the instructions, when executed, cause the atleast one processor to, in response to determining that the first totalscore does not satisfy the threshold, disregard the panelist useraccount.

Example 23 includes the non-transitory computer readable storage mediumof example 17, wherein the threshold is a first threshold and theinstructions, when executed, cause the at least one processor todetermine whether a first percentage of impressions for the panelistuser account that are attributed to the first panelist and on which thefirst total score is based satisfies a second threshold, and in responseto the first total score satisfying the first threshold and the firstpercentage of impressions satisfying the second threshold, store thedemographics of the first panelist for the panelist user account.

Example 24 includes the non-transitory computer readable storage mediumof example 17, wherein the instructions, when executed, cause the atleast one processor to generate one or more demographic correctionmodels based on the demographics of the first panelist.

Example 25 includes a method comprising determining a first total scorefor a first panelist associated with a panelist user account based on atleast one of a first impression score, a first age score, or a firstgender score, determining a second total score for a second panelistassociated with the panelist user account based on at least one of asecond impression score, a second age score, or a second gender score,and in response to determining that the first total score satisfies athreshold, storing demographics of the first panelist for the panelistuser account.

Example 26 includes the method of example 25, further includingdetermining a first percentage of impressions for the panelist useraccount that are attributed to the first panelist, determining a secondpercentage of impressions for the panelist user account that areattributed to the second panelist, defining the first impression scorefor the first panelist based on the first percentage of impressions, anddefining the second impression score for the second panelist based onthe second percentage of impressions.

Example 27 includes the method of example 25, further includingcomparing a self-declared gender of the panelist user account to a firstgender of the first panelist and a second gender of the second panelist,defining the first gender score for the first panelist based on thecomparison, and defining the second gender score for the second panelistbased on the comparison.

Example 28 includes the method of example 25, further includingcomparing a self-declared age of the panelist user account to a firstage of the first panelist and a second age of the second panelist,defining the first age score for the first panelist based on thecomparison, and defining the second age score for the second panelistbased on the comparison.

Example 29 includes the method of example 25, further includingdetermining whether the first total score satisfies the threshold.

Example 30 includes the method of example 25, further including, inresponse to determining that the first total score does not satisfy thethreshold, disregarding the panelist user account.

Example 31 includes the method of example 25, wherein the threshold is afirst threshold and the method further includes determining whether afirst percentage of impressions for the panelist user account that areattributed to the first panelist and on which the first total score isbased satisfies a second threshold, and in response to the first totalscore satisfying the first threshold and the first percentage ofimpressions satisfying the second threshold, storing the demographics ofthe first panelist for the panelist user account.

Example 32 includes the method of example 25, further includinggenerating one or more demographic correction models based on thedemographics of the first panelist.

Example 33 includes an apparatus comprising means for managing scores todetermine a first total score for a first panelist associated with apanelist user account based on at least one of a first impression score,a first age score, or a first gender score, and determine a second totalscore for a second panelist associated with the panelist user accountbased on at least one of a second impression score, a second age score,or a second gender score, and means for interfacing to, in response todetermining that the first total score satisfies a threshold, storedemographics of the first panelist for the panelist user account.

Example 34 includes the apparatus of example 33, further including meansfor managing impressions to determine a first percentage of impressionsfor the panelist user account that are attributed to the first panelist,determine a second percentage of impressions for the panelist useraccount that are attributed to the second panelist, define the firstimpression score for the first panelist based on the first percentage ofimpressions, and define the second impression score for the secondpanelist based on the second percentage of impressions.

Example 35 includes the apparatus of example 33, further including meansfor managing demographics to compare a self-declared gender of thepanelist user account to a first gender of the first panelist and asecond gender of the second panelist, define the first gender score forthe first panelist based on the comparison, and define the second genderscore for the second panelist based on the comparison.

Example 36 includes the apparatus of example 33, further including meansfor managing demographics to compare a self-declared age of the panelistuser account to a first age of the first panelist and a second age ofthe second panelist, define the first age score for the first panelistbased on the comparison, and define the second age score for the secondpanelist based on the comparison.

Example 37 includes the apparatus of example 33, wherein the means formanaging scores are to determine whether the first total score satisfiesthe threshold.

Example 38 includes the apparatus of example 33, wherein the means formanaging scores are to, in response to determining that the first totalscore does not satisfy the threshold, disregard the panelist useraccount.

Example 39 includes the apparatus of example 33, wherein the thresholdis a first threshold, the means for managing scores are to determinewhether a first percentage of impressions for the panelist user accountthat are attributed to the first panelist and on which the first totalscore is based satisfies a second threshold, and the means forinterfacing are to, in response to the first total score satisfying thefirst threshold and the first percentage of impressions satisfying thesecond threshold, store the demographics of the first panelist for thepanelist user account.

Example 40 includes the apparatus of example 33, further including meansfor generating models to generate one or more demographic correctionmodels based on the demographics of the first panelist.

Although certain example methods, apparatus and articles of manufacturehave been disclosed herein, the scope of coverage of this patent is notlimited thereto. On the contrary, this patent covers all methods,apparatus and articles of manufacture fairly falling within the scope ofthe claims of this patent.

The following claims are hereby incorporated into this DetailedDescription by this reference, with each claim standing on its own as aseparate embodiment of the present disclosure.

What is claimed is:
 1. An apparatus comprising: memory; programmablecircuitry; and instructions in the memory, the instructions to cause theprogrammable circuitry to at least: access impression data associatedwith a user account registered with a database proprietor, the useraccount associated with first demographics at a database of the databaseproprietor; determine a primary user of the user account based on theimpression data and based on second demographics of multiple users ofthe user account, the multiple users including the primary user; andmodify the first demographics associated with the user account based onat least some of the second demographics, the at least some of thesecond demographics corresponding to the primary user.
 2. The apparatusof claim 1, wherein the programmable circuitry is to match a panelist ofan audience measurement entity (AME) to a registered user of thedatabase proprietor based on an impression identifier, the impressionidentifier logged by a first server of the AME and a second server ofthe database proprietor, the registered user corresponding to the firstdemographics.
 3. The apparatus of claim 2, wherein the impression datais first impression data, and the programmable circuitry is to combinesecond impression data associated with the panelist and third impressiondata associated with the registered user to generate the firstimpression data, the second impression data collected by the firstserver of the AME, the third impression data collected by the secondserver of the database proprietor.
 4. The apparatus of claim 2, whereinthe user account is shared by the multiple users.
 5. The apparatus ofclaim 1, wherein the programmable circuitry is to: determine percentagesof impressions represented in the impression data that are attributed tocorresponding ones of the multiple users of the user account; anddetermine the primary user of the user account based on the percentagesof the impressions.
 6. The apparatus of claim 1, wherein to modify thefirst demographics associated with the user account based on the atleast some of the second demographics, the programmable circuitry is toassign the at least some of the second demographics to the user account.7. The apparatus of claim 1, wherein the at least some of the seconddemographics include an age of the primary user, a gender of the primaryuser, a race of the primary user, an ethnicity of the primary user, alevel of education of the primary user, an employment status of theprimary user, an income level of the primary user, or a geographiclocation of residence of the primary user.
 8. A non-transitory computerreadable storage medium comprising instruction which, when executed,cause programmable circuitry to at least: access impression dataassociated with a user account registered with a database proprietor,the user account associated with first demographics at a database of thedatabase proprietor; determine a primary user of the user account basedon the impression data and based on second demographics of multipleusers of the user account, the multiple users including the primaryuser; and modify the first demographics associated with the user accountbased on at least some of the second demographics, the at least some ofthe second demographics corresponding to the primary user.
 9. Thenon-transitory computer readable storage medium of claim 8, wherein theinstructions are to cause the programmable circuitry to match a panelistof an audience measurement entity (AME) to a registered user of thedatabase proprietor based on an impression identifier, the impressionidentifier logged by a first computer of the AME and a second computerof the database proprietor, the registered user corresponding to thefirst demographics.
 10. The non-transitory computer readable storagemedium of claim 9, wherein the impression data is first impression data,and the instructions are to cause the programmable circuitry to combinesecond impression data associated with the panelist and third impressiondata associated with the registered user to generate the firstimpression data, the second impression data collected by the firstcomputer of the AME, the third impression data collected by the secondcomputer of the database proprietor.
 11. The non-transitory computerreadable storage medium of claim 9, wherein the user account is sharedby the multiple users.
 12. The non-transitory computer readable storagemedium of claim 8, wherein the instructions are to cause theprogrammable circuitry to: determine percentages of impressionsrepresented in the impression data that are attributed to correspondingones of the multiple users of the user account; and determine theprimary user of the user account based on the percentages of theimpressions.
 13. The non-transitory computer readable storage medium ofclaim 8, wherein to modify the first demographics associated with theuser account based on the at least some of the second demographics, theinstructions are to cause the programmable circuitry to assign the atleast some of the second demographics to the user account.
 14. Thenon-transitory computer readable storage medium of claim 8, wherein theat least some of the second demographics include an age of the primaryuser, a gender of the primary user, a race of the primary user, anethnicity of the primary user, a level of education of the primary user,an employment status of the primary user, an income level of the primaryuser, or a geographic location of residence of the primary user.
 15. Amethod comprising: accessing impression data associated with a useraccount registered with a database proprietor, the user account havingbeen associated with first demographics by the database proprietor;determining, by executing an instruction with processor circuitry, aprimary user of the user account based on the impression data and basedon second demographics of multiple users of the user account, themultiple users including the primary user; and modifying, by executingan instruction with the processor circuitry, the first demographicsassociated with the user account based on at least some of the seconddemographics, the at least some of the second demographics correspondingto the primary user.
 16. The method of claim 15, further includingmatching a panelist of an audience measurement entity (AME) to aregistered user of the database proprietor based on an impressionidentifier, the impression identifier logged in a first database at theAME and in a second database at the database proprietor, the registereduser corresponding to the first demographics.
 17. The method of claim16, wherein the impression data is first impression data, and the methodfurther includes combining second impression data associated with thepanelist and third impression data associated with the registered userto generate the first impression data, the second impression datacollected in the first database of the AME, the third impression datacollected in the second database of the database proprietor.
 18. Themethod of claim 16, wherein the user account is shared by the multipleusers.
 19. The method of claim 15, further including: determiningpercentages of impressions represented in the impression data that areattributed to corresponding ones of the multiple users of the useraccount; and determining the primary user of the user account based onthe percentages of the impressions.
 20. The method of claim 15, whereinmodifying the first demographics associated with the user account basedon the at least some of the second demographics includes assigning theat least some of the second demographics to the user account.
 21. Themethod of claim 15, wherein the at least some of the second demographicsinclude an age of the primary user, a gender of the primary user, a raceof the primary user, an ethnicity of the primary user, a level ofeducation of the primary user, an employment status of the primary user,an income level of the primary user, or a geographic location ofresidence of the primary user.